wo 2004/098539 PCT/US2004/009215 

KINASES AND PHOSPHATASES 

TECHNICAL FIELD 
The invention relates to novel nucleic acids, kinases and phosphatases encoded by these 
5 nucleic acids, and to the use of these nucleic acids and proteins in the diagnosis, treatment, and 
prevention of cardiovascular diseases, immune system disorders, neurological disorders, disorders 
affecting growth and development, lipid disorders, cell proliferative disorders, and cancers. The 
invention also relates to the assessment of the effects of exogenous corqpounds on the expression of 
nucleic acids and kinases and phosphatases. 

BACKGROUND OF THE INVENTION 
Reversible protein phosphorylation is the ubiquitous strategy used to control many of the 
intracellular events in eukaryotic cells. It is estimated that more than ten percent of proteins active in 
a typical mammalian cell are phosphorylated. kinases catalyze the transfer of high*energy phosphate 

15 groups from adenosine triphosphate (ATP) to target proteins on the hydroxyamino acid residues 
serine, threonine, or tyrosine. Phosphatases, in contrast, remove these phosphate groups. 
Extracellular signals including hormones; neurotransnutters, and growth and differentiation factors 
can activate kinases, which can occur as cell surface receptors or as the activator of the final effector 
protdn, as well as other locations along the signal transduction pathway. Cascades of kinases occur, 

20 as well as kinases sensitive to second messenger molecules. This system allows for the amplification 
of weak signals (low abundance growth factor molecules, for exan^le), as well as the synthesis of 
msasy weak signals into an all-or-notbing response. Phosphatases, then, are essential in determining 
the extent of phosphorylation in the cell and, together with kinases, regulate key cellular processes 
such as metabolic enzyme activity, proliferation, cell growth and differentiation, cell adhesion, and 

25 cell cycle progression. 
KINASES 

Kinases con^rise the largest known enzyme superfarraly and vary widely in their target 
molecules. Kinases catalyze the transfer of high energy phosphate groups from a phosphate donor to 
a phosphate acceptor. Nucleotides usually serve as the phosphate donor in these reactions, with most 

30 kinases utilizing adenosine triphosphate (ATP). The phosphate acceptor can be any of a variety of 
molecules, including nucleosides, nucleotides, lipids, carbohydrates, and protdns. Proteins are 
phosphorylated on hydroxyamino acids. Addition of a phosphate group alters the local charge on the 
acceptor molecule, causing internal conformational changes and potentially influencing 
intermolecular contacts. Reversible protein phosphorylation is the prinoary noethod for regulating 

35 protein activity in eukaryotic cells. In general, proteins are activated by phosphorylation in response 



1 



wo 2004/098539 



PCT/US2004/009215 



to extraceQular signals such as honiKones, neurotransmitters, and growth and differentiation factors. 
The activated proteins initiate the cell's intracellular response by way of intracellular signaling 
pathways and second messenger molecules such as cyclic nucleotides, caLciunob-calmodulin, inositol, 
and various mitogens, that regulate protdba phosphorylation. 

5 Kinases are involved in all aspects of a cell's function, firombasic metabolic processes, such 

as glycolysis, to cell-cycle regulation, differentiation, and communication with the extracellular 
environment through signal transduction cascades. Inappropriate phosphorylation of proteins in cells 
has been linked to changes in ceU cycle progression and cell differentiation. Changes in the cell cycle 
have been linked to induction of apoptosis or cancer. Changes in cell differentiation have been linked 

10 to diseases and disorders of the reproductive system, imnnnine system, and skeletal muscle. 

There are two classes of protdn kinases. One class, protein tyrosine kinases (PTKs), 
phosphorylates tyrosine residues, and the other class, protein serine/threoniue kinases (STKs), 
phosphorylates serine and threonine residues. Some PTKs and STKs possess structural 
characteristics of both families and have dual specificity for both tyrosine and serine/threonine 

15 residues. Ahnost all kinases contain a conserved 250-300 amino acid catalytic domain containing 
specific residues and sequence motifs characteristic of the kinase family. The protein kinase catalytic 
domain can be further divided into 1 1 subdomains. N-terminal subdomains I-IV fold into a two-lobed 
structure which binds and orients the ATP donor molecule, and subdomain V spans the two lobes. C- 
temunual subdomains VI-XI bind the protdn substrate and transfer the gamma phosphate from ATP to 

20 the hydroxyl group of a tyrosine, serine, or threonine residue. Each of the 1 1 subdomains contains 
specific catalytic residues or annno acid motifs characteristic of that subdomain. For exancple, 
subdomain I contains an 8-amino acid glycine-rich ATP binding consensus motif, subdomain n 
contains a critical lysine residue required for maximal catalytic activity, and subdomains VI through 
IX comprise the highly conserved catalytic core. VTKs and STKs also contain distinct sequence 

25 motifs in subdomains VI and VHI which naay confer hydroxyamino acid specificity. 

hi addition, kinases may also be classified by additional amino acid sequences, generally 
between 5 and 100 residues, which either flank or occur within the kinase domain. These additional 
amino acid sequences regulate kinase activity and determine substrate specificity. (Reviewed in 
Hardie, G. and S. Hanks (1995) The Protein Kinase Facts Book , Vol I, pp. 17-20 Academic Press, 

30 S an Diego C A)- In particular, two protein kinase signature sequences have been identified in the 
kinase domain, the first containing an active site lysine residue involved in ATP binding, and the 
second containing an aspartate residue important for catalytic activity. If a protein analyzed includes 
the two protein kinase signatures, the probability of that protein being a protein kinase is close to 
100% (PROSITE: PDOCOOlOO, Noveniber 1995). 



2 



wo 2004/098539 



PCTAJS2004/009215 



Protein Tyrosine Kinases 

Protein lyrosine kinases (PTKs) may be classified as either transn»mbrane, receptor PTKs or 
nontransnoembrane, nonreceptor PTK proteins. Transmenibrane lyrosine Idnases function as 
receptors for most growth factors. Growth factors bind to the receptor Qrosine kinase (RTIQ, which 
5 causes the receptor to phosphorylate itself (autophosphorylalion) and specific intraceDular second 
messeng^ proteins. Growth fectors (GF) that associate with receptor PTKs include epidermal GF, 
platelet-derived GF, fibrohlast GF, hepatocyte GF, insulin and insulin-Kke GFs, nerve GF, vascular 
endothelial GF, and macrophage coloiiy stimulating factor. 

Nontransmenibrane, nonreceptor PTKs lack transmembrane regions and, instead, fi3rm 
10 signaling con5)lexes with the cytosolic donciains of plasma membrane receptors. Receptors that 

function flirough non-receptor PTKs include those for cytokines and hormones (growth hormone and 
prolactin), and antigen-specific receptors on T and B lymphocytes. 

Many PTKs were first identified as oncogene products in cancer cells in which PTK 
activation was no longer subject to normal ceDula^ controls. In fact, about one third of the known 
15 oncogenes encode PTKs. Furfhenmre, cellular transformation (oncogeoesis) is often accompanied 
by increased tyrosine phosphorylation activity (Gharbonneau, H. and N.K. Tonks (1992) Amm. Rev. 
Cell Biol. 8:463-493). Regulation of PTK activity may therefore be an in^iortant strategy in 
controlling some types of cancer. 
Protein Serine/Threonine Kinases 
20 Protein serine/threonine Idnases (STKs) are nontransmembrane proteins. A subclass of STKs 

are known as ERKs (extraceflular signal regulated kinases) or MAPs (nitogen-activated protein 
kinases) and are activated after cell stimulation by a variety of hormones and growth factors. Cell 
stimulation induces a signaling cascade leading to phosphorylation of MEK (MAP/ERK kinase) 
which, in turn, activates ERK via serine and threonine phosphorylation. A varied number of proteins 
25 represent the downstream effectors for the active ERK and inoplicate it in the coi]trol of cell 

proliferation and differentiation, as well as regulation of the cytoskeletoiL Activation of ERK is 
normally transient, and cells possess dual specificity phosphatases that are responsible for its down- 
regulation. Also, numerous studies have shown that elevated ERK activity is associated with some 
cancers. Other STKs iaclude the second messenger dependent proteiu kmases such as the 
30 cyclic- AMP dependent protein kinases (PKA), calcium-calmodulin (CaM) dependent protein kinases, 
and the mitogen-activated protein kinases (MAP); the cyclin-dependent protein kinases; checkpomt 
and cell cycle kinases; Numb-associated kinase (Nak); human Fused (hFu); proliferation-related 
kinases; 5 -AMP-activated protein kinases; and kinases involved in apoptosis. 

One member of the ERK family of MAP kinases, ERK 7, is a novel 61-kDa protein that has 
35 motif similarities to ERKl and ERK2, but is not activated by extracellular stimuli as are ERKl and 



3 



wo 2004/098539 



PCT/US2004/009215 



ERK2 nor by the common activators, c-Jim N-teraiinal kinase (JNK) and p38 kinase. ERK7 regulates 
its nuclear localization and inhibition of growth through its C-termmal tail, not through the kinase 
domain as is typical with other MAP kinases (Abe, M.K. (1999) Mol. CeH. BioL 19:1301-1312). 
The second messenger dependent protein kinases primarily mediate the effects of second 
5 messengers such as cyclic AMP (cAMP), cyclic GMP, inositol triphosphate, phosphatidylinositol, 
3,4,5-triphosphate, cyclic ADP ribose, arachidonic acid, diacylglycerol and calciumrcahnodulin. The 
PKAs are involved in mediating hormone-induced cellular responses and are activated by cAMP 
produced within the cell in response to hormone stimulation. cAMP is an intracellular mediator of 
hormone action in all animal cells that have been studied. Hormone-induced cellular responses 
10 include thyroid hormone secretion, Cortisol secretion, progesterone secretion, glycogen breakdown, 
bone xesorption, and regulation of heart rate and force of heart muscle contraction. PKA is found in 
all animal cells and is thought to account for the effects of cAMP in most of these ceUs. Altered PKA 
expression is implicated in a variety of disorders and diseases including cancer, thyroid disorders, 
diabetes, atherosclerosis, and cardiovascular disease (Isselbacher, K.J. et al. (1994) Harrison's 
15 Principles of Internal Medicine , McGraw-Hill, New York NY, pp. 416-431, 1887). 

The casein kinase I (CKI) gene family is another subfamily of serine/threonine protein 
Idnases. This continuously expandmg group of kinases have been in5)licated in the regulation of 
numerous cytoplasmic and nuclear processes, including cell metabolism and DNA replication and 
repair. CKI enzymes are present in the membranes, nucleus, cytoplasm and cytoskeleton of 
20 eukaryotic cells, and on the mitotic spindles of mammalian cells (Fish, K.J. et al. (1995) J. BioL 
Chem 270:14875-14883). 

The CKI fanfly members all have a short amino-tenxnnal domain of 9-76 amino acids, a 
highly conserved kinase domain of 284 anrino acids, and a variable carboxyl-tenmnal domam that 
ranges from 24 to over 200 anuno acids in length (Cegielska, A et al. (1998) J. Biol. Chem. 
25 273:1357-1364). The CKI family is comprised of hi^y related proteins, as seen by the identification 
of isoforms of casein kioase I from a variety of sources. There are at least five m amma li an isoforms, 
a, p, Y» 5> Pish et al. identified CKI-epsilon from a human placenta cDNA library. It is a basic 

protein of 416 amino acids and is closest to CKI-delta, Through recombinant expression, it was 
determined to phosphorylate known CKI substrates and was inhibited by the CKI-specific inhibitor 
30 CBa-7 . The human gene for CKI-epsflon was able to rescue yeast with a slow-growth phenotype 
caused by deletion of the yeast CKI locus, HRR250 (Fish et al., supra). 

The mammalian circadian mutation tau was found to be a semidominant autosomal allele of 
CBa-epsilon that markedly shortens period length of circadian rhythms in Syrian hamsters. The tau 
locus is encoded by casein kinase I-epsilon, which is also a homolog of the Drosophila circadian gene 
35 double-time. Studies of bofli the wildtype and tau mutant CKI-epsilon enzyme radicated that the 



4 



wo 2004/098539 



PCT/US2004/009215 



nnitant eDzyme lias a noticeable reduction in tte maximum velocity and autophosphorylation state. 
Further, in vitro, CKI-epsilon is able to interact with mammalian PERIOD proteins, while the mutant 
enzyme is deficient in its ability to pbosphorylate PERIOD. Lowrey et al. have proposed that CKI- 
epsilon plays a major role in delaying the negative feedback signal within the transcription- 
5 translation-based autoregulatory loop that con:5>oses the core of the circadian mechanism. Therefore 
the CKI-epsilon enzyme is an ideal target for pharmaceutical cocnpounds influencing circadian 
rhythms, jet-lag and sleep, in addition to other physiologic and metabolic processes under circadian 
regulation (Lowrey, P.L. et al. (2000) Science 288:483-491). 

Homeodomain-interacting protein kinases (HBPKs) are serine/threonine Idnases and novel 
10 members of the DYRK kinase subfamily (Hofmann, T.G. et al. (2000) Biochimie 82:1123-1127). 
HIPKs contain a conserved protein kinase domain separated from a domain that interacts with 
homeoproteins. HIPKs are nuclear kinases, and HIPK2 is highly expressed in neuronal tissue (Knn, 
Y.H. et al. (1998) J. Biol. Chem. 273:25875-25879; Wang, Y. et al. (2001) Biochim. Biophys. Acta 
1518: 168-172). HIPKs act as corepressors for homeodomian transcription factors. This corepressor 
15 activity is seen in posttranslationid modifications such as ubiquitination and phosphorylation, each of 
which are in^ortant in the regulation of ceUular protein function (Kim, Y.H. et al. (1999) Proc. Nati. 
Acad. Sci. USA 96:12350-12355). 

The human h- warts protein, a homolog of Drosophila warts tumor suppressor gene, maps to 
chromosome 6q24-25.1. It has a serine/threonine Idnase domain and is localized to centrosomes in 
20 interphase cells. It is mvolved in nutosis and functions as a component of the mitotic apparatus 
(Nishiyama, Y. et al. (1999) FEBS Lett. 459:159-165). 
Calcium-Calmodultn Dependent Prot ein Kinases 

Calciimi-calmodulin dependent (CaM) Idnases are involved in regulation of smooth muscle 
contraction, glycogen breakdown (phosphorylase kinase), and neurotransmission (CaM kinase I and 
25 CaM kinase II). CaM dependent protein kinases are activated by calmodulin, an mtracellular calcium 
receptor, in response to the concentration of jfree calcium in the cell. Many CaM Idnases are also 
activated by phosphorylation. Some CaM kinases are also activated by autophosphorylation or by 
other regulatory kinases. CaM kmase I phosphorylates a variety of substrates including the 
neurotransnitter-related proteins synapsin I and 11, the gene transcription regulator, CREB, and the 
30 cystic fibrosis conductance regulator protein, CFTR (Haribabu, B. et al. (1995) EMBO J. 14:3679- 
3686). CaM kinase II also phosphorylates synapsin at different sites and controls the synthesis of 
catecholamines in the brain through phosphorylation and activation of tyrosine hydroxylase. CaM 
kinase n controls the synthesis of catecholarmnes and seratonin, through phosphorylation/activation 
of tyrosme hydroxylase and tryptophan hydroxylase, respectively (Fujisawa, H. (1990) BioEssays 
35 12:27-29). The mllNA encoding a calmodulin-binding protein Idnase-like protein was foimd to be 



5 



wo 2004/098539 



PCT/US2004/009215 



enriched in mammalian forebraijDu This protein is associated with vesicles in both axons and 
dendrites and accumulates largely postnataUy . The amino acid sequence of this protein is similar to 
CaM-dependent STKs, and the protein binds calmodulin in the presence of calcium (Godbout, M. et 
al. (1994) J. Neurosci. 14:1-13). 

5 Mitogen-Activated Protein Kinases 

Hie mitogen-activated protein kinases (MAP), which mediate signal transduction from the 
cell surface to the nucleus via phosphorylation cascades, are another STK family that regulates 
intracellular signaling pathways. Several subgroups have been idaslified, and each manifests 
different substrate specificities and responds to distinct extracellular stimuli (Egan, S.E. and R.A 

10 Weinberg (1993) Nature 365:781-783). There are three kinase modules con^rising the MAP kinase 
cascade: MAPK (MAP), MAPK kinase (MAP2K, MAPKK, or MKK), and MKK kinase (MAP3K, 
MAPKKK, OR MEKK) (Wang,X.S. et al (1998) Biochem. Biophys. Res. Comraun. 253:33-37). The 
extracellular-regulated kinase (ERK) pathway is activated by growth factors and mitogens, for 
exarqple, epidermal growth factor (EGF), ultraviolet Ught, hyperosmolar medium, heat shock, or 

15 ^otoxic lipopolysaccharide (LPS). The closely related though distinct parallel pathways, the c-Jun 
N-teminal kinase (JNK), or stress-activated Idnase (SAPK) pathway, and the p38 kinase pathway are 
activated by stress stimuli and proinflammatory cytokines such as tumor necrosis factor (TNF) and 
int^leukin-1 (IL-1). Altered MAP kinase e:3q>ression is implicated in a variety of disease conditions 
including cancer, inflammation, immune disorders, and disorders affecting growth and development. 

20 MAP kinase signaling pathways are present in mammalian cells as well as in yeast 
Cvdin-Dependent Prote in Kinases 

Hie cyclin-dependent protein kinases (CDKs) are STKs that control the progression of cells 
through the cell cycle. The entry and exit of a cell from mitosis are regulated by the synthesis and 
destruction of a family of activating proteins called cyclins. Cyclins are small regulatory protdns that 

25 bind to and activate CDKs, which then phosphorylate and activate selected proteins involved in the 
mitotic process. CDKs are imique in that they require multiple iiqmts to beconoe activated. In 
addition to cyclin binding, CDK activation requires the phosphorylation of a specific threonine 
residue and the dephosphorylation of a specific tyrosine residue on the CDK. 

Another family of STKs associated with the cell cycle are the NIMA (never in mitosis)- 

30 related kinases (Neks). Both CDKs and Neks are involved in duplication, maturation, and separation 
of the microtubule organizing center, the centrosome, in animal cells (Fry, AM. et al. (1998) EMBO 
J. 17:470-481). 

Checkpoint and Cell Cvcle Kinases 

In the process of cell division, the order and timing of cell cycle transitions are under control 
35 of cell cycle checkpoints, which ensure that critical events such as DNA replication and chromosome 



6 



wo 2004/098539 



PCTAJS2004/009215 



segregation are carried out with precision. If DNA is damaged, e.g. by radiation, a checkpoint 
pathway is activated that arrests the cell cycle to provide time for repair. If flie damage is extensive, 
apoptosis is induced. In the absence of sudi checlqpoiiits, the damaged DNA is inherited by aberrant 
'cells which may cause prohferative disorders such as cancer. Protein kmases play an hnportaxit role 

5 in this process. For example, a specific kinase, checkpoint kinase 1 (Chkl), has been identified in 
yeast and mammals, and is activated by DNA damage in yeast. Activation of Chkl leads to the arrest 
of the cen at flie G2/M transition (Sanchez, Y. et al. (1997) Science 277:1497-1501). Specifically, 
Chkl phosphorylates the cell division cycle phosphatase CDC25, inhibiting its normal function which 
is to dephosphorylate and activate the cyclin-dependent kinase Cdc2. Cdc2 activation controls the 

10 entry of cells into mitosis (Peng, C.-Y. et al. (1997) Science 277:1501-1505). Thus, activation of 
Chkl prevents the damaged cell from entering nitosis. A deficiency in a checkpoint kinase, such as 
Chkl , may also contribute to csmcer by failure to arrest cells with damaged DNA at other checkpoints 
such as G2/M. 

Proiiferaaon-Related Kinases 

15 Proliferation-related kinase is a serum/cytokine inducible STK that is involved in regulation 

of the cell cycle and cell proliferation in human megakarocytic cells (li, B. et al. (1996) J. BioL 
Chem. 271 :19402-19408). Proliferation-related kinase is related to the polo (derived from 
Drosophila polo gene) fanuly of STKs inoplicated in cell division. Proliferation-related kmase is 
downregulated in lung tumor tissue and may be a proto-oncogene whose deregulated expression in 

20 normal tissue leads to oncogenic transformation. 
S^-AMP-activated protein kinase 

A ligand-activated STK protein kinase is 5 -AMP-activated protein kinase (AMPK) (Gao, G. 
et al. (1996) J. Biol Chem. 271:8675-8681). Mammalian AMPK is a regulator of fatty acid and sterol 
synthesis through phosphorylation of the enzymes acetyl-CoA carboxylase and 

25 hydroxymefhylglutaryl-CoA reductase and mediates responses of these pathways to cellular stresses 
such as heat shock and depletion of ghicose and ATP. AMPK is a heterotrimeric complex comprised 
of a catalytic alpha subunit and two non-catalytic beta and gamma subunits that are believed to 
regulate the activity of the alpha subunit. Subunits of AMPK have a nooach wider distribution in 
non-lipogenic tissues such as brain, heart, spleen, and lung than expected. This distribution suggests 

30 that its role may extend beyond regulation of Upid metabolism alone. 
Kinases in Apoptosis 

Apoptosis is a highly regulated signaling pathway leading to cell death that plays a crucial 
role in tissue development and homeostasis. Deregulation of this process is associated with the 
pathogenesis of a number of diseases includiug autoimmune diseases, neurodegenerative disord^, 
35 and cancer. Various STKs play key roles in this process. ZIP kinase is an STK containmg a 



7 



wo 2004/098539 



PCT/US2004/009215 



C-tenxdnal leadne zipper domam in addition to its N-temunal protein kinase domaio. This 
C-temdnal domain appears to mediate honoodimi^ization and activation of the kinase as well as 
interactions with transcription factors such as activating transcription factor, ATF4, a naember of the 
cyclic-AMP responsive element binding protein (ATF/CREB) fansly of transcriptional factors 

5 (Sanjo, H. et al. (1998) J. BioL Chem. 273:29066-29071). DRAKl and DRAK2 are STKs that share 
homology with the death-associated protdn kmases (DAP kinases), known to function in ioterferon-y 
induced apoptosis (Sanjo et aL, supra). Like ZIP kinase, DAP kinases contain a C-tecminal 
protein-protein interaction domaio, in the form of anl^rin repeats, in addition to the N-terminal 
kinase domain ZIP, DAP, and DRAK kinases induce morphological changes associated with 

10 apoptosis when transfected into NIH3T3 cells (Sanjo et aL, supra). However, deletion of either the 
N-terminal kinase catalytic domain or the C-termmal domain of these proteins abolishes apoptosis 
activity, indicating that in addition to the kinase activity, activiQr in the C-terminal domain is also 
necessary for apoptosis, possibly as an interacting domain with a regulator or a specific substrate. 

RICK is another STK recrafly identified as mediating a specific apoptotic pathway involving 

15 the death receptor, CD95 (Inohara, N. et al. (1998) J. Biol. Chem. 273:12296-12300). CD95 is a 
member of the tumor necrosis factor receptor superfamily and plays a critical role in the regulation 
and homeostasis of the immune system (Nagata, S. (1997) CeH 88:355-365). The CD95 receptor 
signaling pathway involves recruitment of various intracellular molecules to a receptor complex 
following hgand binding. This process includes recruitment of the cysteine protease caspase-8 

20 which, in turn, activates a caspase cascade leading to cell death RICK is composed of an N-t^minal 
kinase catalytic domain and a C-terminal "caspase-recruitment" domain that interacts with 
caspase-hke domains, indicating that RICK plays a role in the recruitmu&nt of caspase-8. This 
interpretation is supported by the fact that the expression of RICK in human 293T cells promotes 
activation of caspase-8 and potentiates the induction of apoptosis by various proteins involved in the 

25 CD95 apoptosis pathway (Inohara et al., supra). 
Mitochondrial Protein Kinases 

A novel class of eukaryotic kinases, related by sequence to prokaryotic histidine protein 
kinases, are the mitochondrial protein kinases (MPKs) which seem to have no sequence similarity 
with other eukaryotic protein kinases. These protein kinases are located exclusively in the 

30 mitochondrial matrix space and may have evolved from genes originally present in respiration- 
dependent bacteria which were endocytosed by primitive eukaryotic cells. MPKs are responsible for 
phosphorylation and inactivation of the branched-chain alpha-ketoacid dehydrogenase and pyruvate 
dehydrogenase complexes (Harris, R.A. et al. (1995) Adv. Enzyme Regul. 34:147-162). Five MPKs 
have been identified. Four members correspond to pyruvate dehydrogenase kinase isozymes, 

35 regulating the activity of the pyruvate dehydrogenase complex, which is an important regulatory 

8 



wo 2004/098539 



PCT/US2004/009215 



enzynie at the interface between glycolysis and the citric acid cycle. The fifth member corresponds to 
a branched-ctaain alpha-ketoacid dehydrogenase kinase, important in the regulation of the pathway for 
the disposal of branched-chain anrino acids. (Harris, R.A et al. (1997) Adv. Enzyme Regul. 37:271- 
293)- Both starvation and the diabetic state are known to result in a great increase in the activity of 
5 the pyruvate dehydrogenase Idnase in the liver, beart and muscle of the rat. TTiis increase contributes 
in both disease states to the phosphorylation and inactivation of the pyruvate dehydrogenase complex 
and conservation of pyruvate and lactate for gluconeogenesis (Harris (1995) supra). 

KINASES WITH NON-PROTEIN SUBSTRATES 

10 Lipid and Inosttol kinases 

Lipid Idnases phospborylate hydroxyl residues on lipid head groups. A family of kinases 
involved in phosphoi^ation of phosphatidylinositol (PT) has been described, each member 
phosphorylating a specijSc carbon on the inositol ring (Leevers, SJ. et al. (1999) Curr. Opin. Cell. 
Biol. 1 1 :219-225). The phosphorylation of phosphatidylinositol is involved in activation of the 

15 proteiu kinase C signaling pathway. The inositol phospholipids (phosphoinositides) intracellular 
signaling pathway begins with binding of a signaling molecule to a G-protein hnked receptor in the 
plasma membrane. This leads to the phosphorylation of phosphatidylinositol (PI) residues on the 
inner side of the plasma mendbrane by inositol kinases, thus converting PI residues to the biphosphate 
state (PIP2). PIP2 is then cleaved into inositol triphosphate (TPs) and diacylglycerol. These two 

20 products act as noediators for separate signaling pathways. Cellular responses that are mediated by 
these pathways are glycogen breakdown in the liver in response to vasopressin, smooth muscle 
contraction in response to acetylcholine, and throndbin-induced platelet aggregation. 

PI 3-ldnase (PDK), which phosphorylates the D3 position of PI and its derivatives, has a 
central role in growth factor signal cascades involved in cell growth, differentiation, and metabolism. 

25 PDK is a heteiodimer consisting of an adapter subunit and a catalytic subunit The adapter subnnit 
acts as a scaffolding protein, interacting with specific tyrosine-phosphorylated proteins, lipid 
moieties, and other cytosolic factors. When the adapter subunit binds tyrosine phosphorylated 
targets, such as the insulin responsive substrate (IRS)-l, the catalytic subunit is activated and converts 
PI (4,5) bisphosphate (PIP2) to PI (3,4,5) P3 (PIP3). PIP3 then activates a number of other proteins, 

30 including PKA, protein kinase B (PKB), protein kinase C (PKC), glycogen synthase kinase (GSK)-3, 
and p70 ribosomal s6 kinase. PI3K also interacts directty with the cytoskeletal organizing proteins, 
Rac, rho, and cdc42 (Shepherd, P.R. et al. (1998) Biochem J. 333:471-490). Animal models for 
diabetes, such as obese and/a^ mice, have altered PDK adapter subunit levels. Specific nooitations in 
the adapter subunit have also been found in an insulin-resistant Danish population, suggesting a role 

35 for PDK in type-2 diabetes (Shepard, supra). 



9 



wo 2004/098539 



PCT/US2004/009215 



Aa exanople of lipid krDase phosphorylation activity is the phosphorylation of 
D-eayfhro-sphingosine to the sphingolipid metabolite, sphingosine-l-phosphate (SPP). SPP has 
emerged as a novel lipid second-messenger withhoth extraceHulgir and intraceHnlar actions (Kohama, 
T. et al. (1998) J, Biol. Chem 273:23722-23728). Extracellularly, SPP is a ligand for the G-protein 

5 coupled receptor EDG-1 (endofhelial-derived, G-protein coupled receptor). Intracellularly, SPP 
regulates cell growfh, survival, motility, and cr^oskeletal changes. SPP levels are regulated by 
sphmgosine kinases that specifically phosphorylate D-er3^o-sphiDgosiD& to SPP. The in^ortance of 
sphingosine Idnase in cell signaling is indicated by the fact that various stimuli, including 
platelet-derived growth factor (PDGF), nerve growth factor, and activation of protein Idnase C, 

10 increase cellular levels of SPP by activation of sphingosine Idnase, and the fact that conqietitive 
inhibitors of the enzyme selectively inhibit cell proliferation induced by PDGF (Kohama et al., 
supra). 

Purine Nudeotide Kinases 

The purine nucleotide kinases, adenylate kmase (ATPrAMP phosphotransferase, or AdK) and 

15 guanylate kinase (ATP:GMP phosphotransferase, or GuK) play a key role in xmcleotide metabolism 
and are crucial to the synthesis and regulation of ceDular levels of ATP and GTP, respectively. These 
two molecules are precursors in DNA and RNA synthesis in growing cells and provide the primary 
source of biochemical energy in cells (ATP), and signal transduction pathways (GTP). Inhibition of 
various steps in the synthesis of these two molecules has been the basis of many antiproliferative 

20 drugs for cancer and antiviral therapy (PiQwein, K. et al. (1990) Cancer Res. 50:1576-1579). 

AdK is found in almost all cell types and is especially abundant in cells having high rates of 
ATP synthesis and utilization such as skeletal muscle. In these cells AdK is physically associated 
with mitochondria and myofibrils, the subcellular stmctures that are involved in energy production 
and utilization, respectively. Recent studies have demonstrated a major function for AdK in 

25 transf<^ciing high energy phosphoryls from noetabolic processes generating ATP to cellular 

con5)onents consuming ATP (Zelranikar, R.J. et al. (1995) J. Biol. CaieoL 270:7311-7319). Thus 
AdK may have a pivotal role in maintaining energy production in cells, particularly those having a 
high rate of growth or metabolism such as cancer cells, and may provide a target for suppression of its 
activity in order to treat certain cancers. Alternatively, reduced AdK activity may be a source of 

30 various metabolic, muscle-energy disorders that can result in cardiac or respiratory failure and may be 
treatable by increasing AdK activity. 

GuK, in addition to providing a key step in the synthesis of GTP for RNA and DNA 
synthesis, also fulfills an essential function in signal transduction pathways of cells through the 
regulation of GDP and GTP. Specifically, GTP binding to menibrane associated G proteins mediates 

35 the activation of cell receptors, subsequent intracellular activation of adenyl cyclase, and production 



10 



wo 2004/098539 



PCT/US2004/009215 



of the second messengesr, cyclic AMP. GDP binding to G proteins inhibits fh^^ GDP and 

GTP levels also control the activity of certain oncogenic proteins such as p21"' known to be involved 
in control of cell proliferation and oncogenesis (Bos, J.L. (1989) Canc^ Res. 49:4682-4689). High 
ratios of GTP:GDP caused by suppression of GuK cause activation of p21"" and promote 

5 oncogenesis. Increasing GuK activity to increase levels of GDP and reduce the GTPiGDP ratio may 
provide a therapeutic strategy to reverse oncogenesis. 

GuK is an icrportant enzyme in the phosphorylation and activation of certain antiviral drugs 
useful in the treatment of herpes virus infections. These drugs include the guanine homologs 
acyclovir and buciclovir (Miller, W.H. and R.L. MiEer (1980) J. Biol. Chem. 255:7204-7207; 

10 Stenberg, K. et al. (1986) J. Biol. CaieoL 261:2134-2139). Increasing GuK activity in infected cells 
may provide a therapeutic strategy for augmenting the effectiveness of these drugs and possibly for 
reducing the necessary dosages of the drugs. 
Pvrimidine Kinases 

The pyrinddine kinases are deoxycytidine kinase and thynudine Idnase 1 and 2. 

15 Deoxycytidine kinase is located in the nucleus, and thynudine kinase 1 and 2 are found in the cytosol 
(Johansson, M. et al. (1997) Proc. Natl. Acad. Sci. USA 94:1 1941-1 1945). Phosphorylation of 
deoxyribonucleosides by pyrimidine kinases provides an alternative pathway for de novo synthesis of 
DNA precursors. The role of pyrimidine kinases, like purine kinases, in phosphorylation is critical to 
the activation of several chCTOotherapeuticaUy inq)ortant nucleoside analogues (Amer E.S. and S. 

20 Eriksson (1995) Phamoacol. Ther. 67:155-186). 

PHOSPHATASES 

Protein phosphatases are generally characterized as either serine/threonine- or tyrosine- 
specific based on their preferred pbospho-amino acid substrate. However, some phosphatases (DSPs, 

25 for dual specificity phosphatases) can act on phosphorylated tyrosine, serine, or threonine residues. 
The protein serine/threonine phosphatases (PSPs) are in^ortant regulators of many cAMP-mediated 
hormone responses in cells. Protein tyrosine phosphatases (PTPs) play a significant role in cell cycle 
and cell signaling processes. Another family of phosphatases is the acid phosphatase or histidioe acid 
phosphatase (HAP) family whose members hydrolyze phosphate esters at acidic pH conditions. 

30 PSPs are found in the cytosol, nucleus, and mitochondria and in association with cytoskeletal 

and membranous structures in noost tissues, especially the brain. Some PSPs require divalent cations, 
such as Ca^* or Mn^***, for activity. PSPs play important roles in glycogen metabolism, muscle 
contraction, protein synthesis, T cell function, neuronal activity, oocyte maturation, and hepatic 
metabolism (reviewed in Cohen, P. (1989) Annu. Rev. Biochem. 58:453-508). PSPs can be separated 

35 into two classes. The PPP class includes PPl, PP2A, PP2B/calcineurin, PP4, PP5. PP6, and PP7. 

11 



wo 2004/098539 



PCT/US2004/009215 



Members of this class are coiiq)osed of a homologous catalytic subuoit bearing a very highly 
conserved signature sequence, coupled with one or more regulatory subunits (PROSITE 
PDOCOOl 15). Further interactions with scaffold and anchoring roolecules determme the intracellular 
localization of PSPs and substrate specificity. The PPM class consists of several closely related 
5 isoforms of PP2C and is evolutionarily unrelated to the PPP class. 

PPl dephosphorylates miany of the protdns phosphorylated by cyclic AMP-depend^ protein 
kinase (PKA) and is an in^rtant regulator of many cAMP-mediated hormone responses in cells. A 
number of isoforms have been identified, with the alpha and beta forms being produced by alternative 
splidng of the same gene. Both ubiquitous and tissue-specific targeting protdns for PPl have been 

10 identified. In the brain, inhibition of PPl activity by the dopamine and adenosine 3*,5 - 

monophosphate-regulated phosphoprotein of 32kDa (DARPP-32) is necessary for normal dopamine 
response in neostriatal neurons (reviewed in Price, N.E. and M.C- Mumby (1999) Curr. Opin. 
Neurobiol. 9:336-342). PPl , along with PP2A, has been shown to limit motility in microvascular' 
endothelial cells, suggesting a role for PSPs in the inhibition of angiogenesis (Gabel, S. et al. (1999) 

15 Otolaryngol. Head Neck Surg. 121 :463-468). 

PP2A is the main serine/threonine phosphatase. The core PP2A enzyme consists of a single 
36 kDa catalytic subunit (C) associated with a 65 kDa scaffold subunit (A), whose role is to recruit 
additional regulatory subunits (B). Three gene families encoding B subunits are known (PR55, PR61, 
and PR72), each of which contain multiple isoforms, and additional families may exist (Millward, 

20 T. A et al. (1999) Trends Biosci. 24: 1 86-191). These "B-type" subunits are cell type- and tissue- 
specific and determine the substrate specificity, enzynoatic activity, and subcellular localization of the 
holoenzyme. The PR55 family is highly conserved and bears a conserved motif (PROSITE 
PDOC00785). PR55 increases PP2A activity toward mitogen-activated protein kinase (MAPK) and 
MAPK kinase (MEK). PP2A dephosphorylates the MAPK active site, inhibiting the cell's entry into 

25 mitosis. Several proteins can compete with PR55 for PP2A core enzyme binding, including the CKII 
kinase catalytic subunit, polyomavirus middle and small T antigens, and S V40 small t antigen. 
Viruses may use this mechanism to commander PP2A and stimulate progression of the cell through 
the ceU cycle (Pallas, D.C. et al. (1992) J. Virol. 66:886-893). Altered MAP kmase expression is also 
in[q)licated in a variety of disease conditions including cancer, inflammation, immune disorders, and 

30 disorders affecting growth and development PP2 A, in fact, can dephosphorylate and modulate the 
activities of more than 30 protein kinases in vitro, and other evidence suggests that the same is true in 
vivo for such kinases as PKB, PKC, the calmodulin-dependent kinases, ERK family MAP kinases, 
cyclin-dependent kinases, and the IkB kinases (reviewed in Millward et al., supra). PP2A is itself a 
substrate for CKI and CKII kinases, and can be stimulated by polycationic macromolecules. A PP2A- 

35 like phosphatase is necessary to maintain the Gl phase destruction of mammalian cyclins A and B 

12 



wo 2004/098539 PCT/US2004/009215 

(Bastians, H. et al. (1999) Mol. Biol. CeD 10:3927-3941). PP2A is a major acti\dty in the brain and is 
implicated in regulating neurojBlamBnt stability and normal neural function, particularly the 
phosphorylation of the microtubule-assodated protein tan. Hyperphosphorylation of tau has been 
proposed to lead to the neuronal degeneration seen in Alzheimer's disease (reviewed in Price and 
5 Munaby, supra). 

PP2B, or calcineurin, is a Ca^^-activated dinoeric phosphatase and is particularly abundant in 
the brain. It consists of catalytic and regulatoiy subunits, and is activated by the binding of the 
calcium/cahnodulin coniplex. Calcineurin is the target of the immunosuppressant drugs cyclosporine 
and FKS06. Along with other cellular factors, these drugs interact with calcineurin and inhibit 

10 phosphatase activity. In T cells, this blocks the calcium dependent activation of the NF- AT f anrily of 
transcription factors, leading to immunosuppression. This fanmly is widely distributed, and it is likely 
that caldneurin regulates gene expression in other tissues as well. la neurons, calcineurin modulates 
functions which range from the inhibition of neurotransmitter release to desensitization of 
postsynaptic NMDA-receptor coupled calcium channels to long term memory (reviewed in Price and 

15 Mumby, supra). 

Other members of the PPP class have recently been identiGed (Cohen, P.T. (1997) Trends 
Biochem. Sci. 22:245-251). One of them, PP5, contains regulatory domains with tetratricopeptide 
repeats. It can be activated by polyunsaturated fatty acids and anionic phospholipids in vitro and 
appears to be involved in a number of signaling pathways, including those controlled by atrial 
20 natriuretic peptide or steroid hormones (reviewed in Andreeva, A.V. and M.A- Kutuzov (1999) Cell 
Signal. 11:555-562). 

PP2C is a -42kDa monomer with broad substrate specificity and is dq^endent on divalent 
cations (mainly Mn^ or Mg^*) for its activity, PP2C proteins share a conserved N-temunal region 
with an invariant DGH motif, which contains an aspartate residue involved in cation binding 

25 (PROSITE PDOC00792). Targeting proteins and mechanisms regulating PP2C activity have not 
been identified. PP2C has beai shown to iiihibit the stress-responsive p38 and Jun kinase (JNK) 
pathways (Takekawa, M. et al. (1998) EMBO J. 17:4744-4752). 

In contrast to PSPs, tyrosine-specific phosphatases (PTPs) are generally noonomeric proteins 
of very diverse size (firom 20kDa to greater than 1 OOkDa) and structure that function primarily in the 

30 transduction of signals across the plasma menibrane. PTPs are categorized as either soluble 

phosphatases or transmenibrane receptor proteins that contain a phosphatase domain. All PTPs share 
a conserved catalytic domain of about 300 amino acids which contains the active site. The active site 
consensus sequence includes a cysteine residue which executes a nucleophilic attack on the phosphate 
moiety during catalysis (Neel, B.G. and N.K. Tonks (1997) Curr. Opin. Cell Biol. 9:193-204) . 

35 Receptor PTPs are made up of an N-terminal extracellular domain of variable length, a 

13 



wo 2004/098539 PCT/US2004/009215 

txansnienibraBe region^ and a cytoplasnuc region that gmerally contains two copies of the catalytic 
domam. Aldiough only the fiist copy seems to have enzymatic activity, the second copy apparently 
affects the substrate specifidty of the jSrst The extracellular domains of some rec^tor PTPs contam 
fibroncctin-lifce repeats, inomunogilbbiilin-like domains, MAM domains (an extracellular motif likely 
5 to have an adhesive fimction), or carbonic anhydrase-like domains (PROSITE PDOC 00323). Hiis 
wide variety of stractural motifs accounts for the diversity in size and specificity of PTPs. 

PTPs play important roles in biological processes such as cell adhesion, lymphocyte 
activation, and cell proliferation. PTPs fi and k are involved in cell-cell contacts, peihaps regulating 
cadheiin/catenin fiinction. A number of PTPs affect cell spreading, focal adhesions, and cell motility, 

10 most of them via the integrin/tyrosine kinase signahng pathway (reviewed in Neel and Toiiks, supra). 
CD45 phosphatases regulate signal transduction and lymphocyte activation (Ledbetter, J. A et al. 
(1988) Proc. NatL Acad. Sci. USA 85:8628-8632). Soluble PTPs containing Src-homology-.2 
domains have been identified (SHPs), suggestis^ that these molecules ndght iniieract with icceptox 
tyrosine Idnases. SHP-1 regulates cytokine receptor signaling by controlling the Janus fanuly PTKs 

15 in hematopoietic cells, as well as signaling by the T-cell receptor and c-Kit (reviewed in Neel and 
Tonks, supra). M-phase inducer phosphatase plays a key role in the induction of mitosis by 
dephosphorylating and activating the PTK CDC2, leadmg to ceU division (Sadhu, K. et al. (1990) . 
Proc. Natl. Acad. Sci. USA 87:5139-5143). In addition, the genes encoding at least eight PTPs have 
been mapped to chromosomal regions that are translocated or rearranged in various neoplastic 

20 conditions, including lymphoma, small cell lung carcinonna, leukemia, adenocarcinoma, and 

neuroblastoma (reviewed in Charbonneau, H. and N.K. Tonks (1992) Annu. Rev, Cell Biol. 8:463- 
493). The FTP enzyme active site con^rises the consensus sequence of the MTMl gene family. The 
MTMl gene is responsible for X-linked recessive noyotubular nij^opathy, a congenital muscle 
disorder that has beenlmked to Xq28 (Kioschis, P. et al., (1998) Genondcs 54:256-266). Many PTKs 

25 are encoded by oncogenes, and it is well known that oncogenesis is often acconq)anied by increased 
tyrosine phosphorylation activity. It is therefore possible that PTPs may serve to prevent or reverse 
cell transformation and the growth of various cancers by controlling the levels of tyrosine 
phosphorylation in cells. This is supported by studies showing that ov^expression of FTP can 
suppress transfomoation in ceUs and that specific inhibition of PTP can ejoihance cell transformation 

30 (Charbonneau and Tonks, supra). 

Apyrases are enzymes that efficiently hydrolyze ATP and ADP and may function either intra- 
or extracellularly. One type of apyrase, ATP-diphosphohydrolase, catalyzes the hydrolysis of 
phosphoanhydride bonds of nucleoside tri- and di-phosphates in the presence of divalent cations 
(Nouiizad, N. et al., (2003) Protein Purif. 27:229-237). 

35 Dual specificity phosphatases (DSPs) are structurally naore similar to the PTPs than the PSPs. 



14 



wo 2004/098539 PCT/US2004/009215 

DSPs bear an extended FTP active site motif with an additional 7 amino acid residues. DSPs are 
primarily associated witii cell proliferation and include the cell cycle regulators cdc25A, B, and C. 
The phosphatases DUSPl and DUSP2 inactivate the MAPK family members ERK (extracellular 
signal-regulated kinase), JNK (c-Jim N-tenrdnal kinase), and p38 on both tyrosine and threonine 

5 r^idues (PROSITE PDOC 00323, supra). In the activated state, these kinases have been in^licated 
in neuronal differentiation, proliferation, oncogenic transformation, platelet aggregation, and 
apoptosis. Thus, DSPs are necessary for proper regulation of these processes (Muda, M. et al. (1996) 
J. Biol. Chem 271:27205-27208). The tunaor suppressor PTEN is a DSP that also shows lipid 
phosphatase activity. It seems to negatively regulate interactions with the extraceDular matrix and 

10 niatti teinR sensitivity to apoptosis. PTEN has been implicated in the prevention of angiogeoesis (Giri, 
D. and M. Ittmann (1999) Hum. Pathol. 30:419-424) and abnormalities in its expression are 
associated with numerous cancers (reviewed in Tamura, M. et al. (1999) J. Natl. Cancer Inst. 
91:1820-1828). 

Histidine acid phosphatase (HAP; EXPASY EC 3.1.3.2), also known as add phosphatase, 

15 hydrolyzes a wide spectrum of substrates including alkyl, aiyl, and acyl orthophosphate noonoest^ 
and phosphorylated proteins at low pH. HAPs share two regions of conserved sequences, each 
centered around a histidioe residue which is involved in catalytic activity. Menobers of the HAP 
family include lysosomal acid phosphatase (LAP) and prostatic acid phosphatase (PAP), both 
sensitive to inhibition by L-tartrate (PROSITE PDC)C00538). 

20 Synaptojanin, a polyphosphoinositide phosphatase, dephosphorylates phosphoinositides at 

positions 3, 4 and 5 of the inositol ring. Synaptojanin is a major presynaptic protein found at clathrin- 
coated endocytic intermediates in nerve terminals, and binds the clathrin coat-associated protein, 
EPS 15. This binding is noediated by the C-temunal region of synaptojanin- 170» which has 3 Asp-Pro- 
Phe anmio acid repeats. Further, this 3 residue repeat had been found to be the binding site for the 

25 EH domains of EPS15 (HafEuer, C. et al. (1997) FEES Lett. 419:175-1 80). Additionally, 

synaptojanin may potentially regulate interactions of endocytic proteins with the plasma membrane, 
and be involved in synaptic vesicle recycling (Brodin, L. et al. (2000) Curr. Opin. Neurobiol. 10:312- 
320). Studies in mice with a targeted disruption in the synaptojanin 1 gene (Syxy 1) were shown to 
support coat formation of endocytic vesicles more effectively than was seen in wild-type mice, 

30 suggesting that Synjl can act as a negative regulator of membrane-coat protein interactions. These 
findings provide genetic evidence for a cracial role of phosphoinositide metaboUsmin synaptic 
vesicle recycUng (Cremona, O. et al. (1999) Cell 99:179-188). 

Rypregginn pmfiliTip; 

Mioroarrays are analytical tools used in bioanalysis. A microarray has a plurality of 
35 molecules spatially distributed over, and stably associated with, the surface of a solid support 



15 



wo 2004/098539 



PCT/US2004/009215 



Microarrays of polypeptides, polynucleotides, and/or antibodies have been developed and find use in 
a variety of applications, such as gene sequencing, monitoring gene expression, gene mapping, 
bacterial identification, drug discovery, and combinatorial chemistry. 

One area in particular in which microarrays find use is in gene expression analysis. Array 
5 technology can provide a sijoople way to explore the expression of a single polyaK>rphic gene or tbe 
expression profile of a large nurdber of related or unrelated genes. When the expression of a single 
gene is examined, arrays are eaq}loyed to detect the expression of a specific gene or its variants. 
When an expression profile is exanooned, arraj^ provide a platform for identifying genes that are 
tissue specific, are affected by a substance being tested in a toxicology assay, are part of a signaling 

10 cascade, carry out housekeeping functions, or are specifically related to a particular genetic 
predisposition, condition, disease, or disorder. 
Neurological disorders 

Characterization of region-specific gene expression in the human brain provides a context 
and background for molecular neurobiology on a variety of neurological disorders. 

15 Al2heimer's disease (AD) is a progressive, ncurodestructive process of the human neocortex, 

characterized by the deterioration of memory and higjher cognitive function. A progressive and 
irreversible brain disorder, AD is characterized by three major pathogenic episodes involving (a) an 
aberrant processing and deposition of beta-amyloid precursor protein (betaAPP) to form neurotoxic 
beta-anayloid (betaA) peptides and an aggregated insoluble polymer of betaA that fonns the senile 

20 plaque, (b) the establishment of intraneuronal neuritic tau pathology yielding widespread deposits of 
agyrophilic neurofibrillary tangiles (NFT) and (c) the initiation and proliferation of a brain-specific 
inflammatory response. These three seemingly disperse attributes of AD etiopathogenesis are linked 
by the fact that proinflammatory microglia, reactive astrocytes and thdr associated cytokines and 
chonokines are associated with the biology of the microtubule associated protein tau, betaA 

25 speciation and aggregation. Missense mutations in the presenilin genes PS 1 and PS2, implicated in 
early onset familial AD, cause abnormal betaAPP processing with resultant overproduction of 
betaA42 and related neurotoxic peptides. Specific betaA fragments such as betaA42 can further 
potentiate proinflanomatory mechanisms. Expression of the inducible oxidoreductase 
cyclooxygenase-2 and cytosolic phospholipase A2 (cPLA2) is strongly activated during cerebral 

30 ischemia and trauma, epilepsy and AD, indicating the induction of proinflammatory gene pathways as 
a response to brain injury. Neurotoxic metals such as aluminum and zinc, both implicated in AD 
etiopathogenesis, and arachidonic acid, a major metabolite of brain cPLA2 activity, each polymerize 
hyperphosphorylated tau to formNFT-like bundles. Studies have identified a reduced risk for AD in 
patients aged over 70 years who were previously treated with non-steroidal anti-inflammatory dmgs 

35 for non-CNS afflictions that include arthritis. (For a review of the interrelationships between the 



16 



wo 2004/098539 PCT/US2004/009215 

mechanisnis of PS 1 , PS2 and betaAPP gene expression, tan and betaA deposition and the induction, 
regulation and proliferation in AD of the neuroinfl ammatory response, see Lnkiw, WJ, and Bazan, 
N.G. (2000) Neurochem Res. 2000 25:1173-1184). 
Breast Cancer 

5 More than 180,000 new cases of breast cancer are diagnosed each year, and the mortality rate 

for breast cancer approaches 10% of all deaths in females between the ages of 45-54 (Gish, K. (1999) 
AWIS Magazine 28:7-10). However, the survival rate based on early diagnosis of localized breast 
cancer is extremely high (97%), compared with the advanced stage of the disease in which the tunoor 
has spread beyond the breast (22%). Current procedures for clinical breast examination are lacking in 

10 sensitivity and specificity, and efforts are underway to develop comprehensive gene expression 
profiles for breast cancer that may be nsed in conjunction with conventional screening methods to 
inoprove diagnosis and prognosis of this disease (Perou, CM. et al. (2000) Nature 406:747-752). 

Mutations in two genes, BRCAl and BRCA2, are known to greatly predispose a woman to 
breast cancer and noay be passed on from par^its to children (Gish, supra). However, this type of 

15 hereditary breast cajicer accounts for only about 5% to 9% of breast cancers, while the vast majority 
of breast cancer is due to non-inharited mutations that occur in breast epithelial cells. 

The relationship between expression of epidermal growth factor (EGF) and its receptor, 
EGFR, to human mammary carcinoma has been particularly weU studied. (See Khazaie, K. et al. 
(1993) Cancer and Metastasis Rev. 12:255-274, and references cited therdn for a review of this area.) 

20 Ov^expression of EGFR, particularly coupled with down-regulation of the estrogen receptor, is a 
marker of poor prognosis in breast cancer patients. In addition, EGFR expression in breast tumor 
metastases is firequently elevated relative to the primary tumor, suggesting that EGFR is involved in 
tumor progression and metastasis. This is supported by accumulating evidence that EGF has effects 
on cell functions rdated to metastatic potential, such as cell motility, chemotaxis, secretion and 

25 differentiation. Changes in expression of other members of the erbB receptor family, of which EGFR 
is one, have also been implicated in breast cancer. The abundance of erbB receptors, such as HER- 
2/neu, HER-3, and HER-4, and their ligands in breast cancer points to their functional in:;>ortance in 
the pathogenesis of the disease, and may therefore provide targets for therapy of the disease (Bacus, 
S.S. et aL (1994) Axa J. Clin. Pathol. 102:S13-S24). Other known markers of breast cancer include a 

30 human secreted firizzled protein mRNA that is downregulated in breast tumors; the matrix Gla 
protein which is overexpressed in human breast carcinoma cells ; Drgl or RTF, a gene whose 
expression is diminished in colon, breast, and prostate tumors; maspin, a tumor suppressor gene 
downregulated in invasive breast carcinomas; and CaN19, a member of the S 100 protein f annly, all of 
which are down-regulated in mammary carcinoma cells relative to normal mammary epithelial cells 

35 CZtou, Z. et al. (1998) Int J. Cancer 78:95-99; Chen, L. et al. (1990) Oncogene 5:1391-1395; Uhix, 



17 



wo 2004/098539 



PCT/US2004/009215 



W. et al (1999) FEBS Lett 455:23-26; Sager, R. et al. (1996) Cum Top. Microbiol. Imraiiiiol. 213:51- 
64; and Lee, S.W. et al. (1992) Proc. Natl. Acad. Sci. USA 89:2504-2508). 

Cell lines derived fromliiinianiiianimary epithelial cells at various stages of breast cancer 
provide a useful model to study fte process of malignant transformation and tumor progression as it 
5 has been shown that these cell lines retain many of the properties of thdr parental tumors for lengthy 
culture periods (Wistuba, I.I. et al. (1998) Clin. Cancer Res. 4:2931-2938). Such a rmdel is 
particularly useful for concparing phenotypic and molecular characteristics of human mammary 
epithelial cells at various stages of malignant transformation. 
Colon Cancer 

10 While soft tissue sarcomas are relatively rare, more than 50% of new patients diagnosed wth 

the disease will die from it The molecular pathways leading to the development of sarcomas are 
relatively unknown, due to the rarity of the disease and variation in pathology. Colon cancer evolves 
through a roulti-step process whereby pre-malignant colonocytes undergo a relatively defined 
sequence of events leading to tumor formation. Several factors participate in the process of tumor 

15 progression and malignant transformation including genetic factors, mutations, and selection. 

To understand the nature of gene alterations in colorectal cancer, a nurdber of studies have 
focused on the inherited syndromes. Faroilial adenomatous polyposis (FAP), is caused by mutations 
in the adenomatous polyposis coli gene (APC), resulting in truncated or inactive forms of the protein. 
This tumor suppressor gene has been mapped to chromosome 5q. Hereditary nonpolyposis colorectal 

20 cancer (HNPCC) is caused by mutations in mis-match repair genes. Although hereditary colon cancer 
syndromes occur in a small percentage of the population and most colorectal cancers are considered 
sporadic, knowledge from studies of the hereditary syndromes can be generally applied. For instance, 
somatic mutations in APC occur in at least 80% of sporadic colon tumors. APC mutations are 
thought to be the initiating event in the disease. Other routations occur subsequentiy. Approximately 

25 50% of colorectal cancers contain activating mutations in ras, while 85% contain inactivating 
mutations in p53. Changes in all of these genes lead to gene expression changes in colon cancer. 
Lung Cancer 

The potential application of gene expression profiling is particularly relevant to in^roving 
diagnosis, prognosis, and treatment of cancer, such as lung cancer. Lung cancer is the leading cause 

30 of cancer death in the United States, affecting more than 100,000 men and 50,000 wonoen each year. 
Nearly 90% of the patients diagnosed with lung cancer are cigarette smokers. Tobacco smoke 
contains thousands of noxious substances that induce carcinogen metabolizing enzymes and covalent 
DNA adduct formation in the exposed bronchial epithelium. In nearly 80% of patients diagnosed 
with lung cancer, metastasis has already occurred. Most commonly lung cancers metastasize to 

35 pleura, brain, bone, pericardium, and liver. The decision to treat with surgery, radiation therapy, or 

18 



wo 2004/098539 PCT/US2004/009215 

chemotherapy is made on the basis of tumor histology, respojose to growth factors or hormones, and 
sensitivity to inhibitors or drugs. With current treatments, most patients die within one year of 
diagnosis. Earlier diagnosis and a systematic approach to identification, staging, and treatment of 
lung cancer could positively affect patient outcome. 

Lung cancers progress through a series of morphologically distinct stages fromhyperplasia 
to invasive carcinoma. Malignant lung cancers are divided into two groups comprising four 
histopathological classes. The Non Small Cell Lung Carcinoma (NSCLC) group includes squamous 
cell carcinomas, adenocarcinonaas, and large ceU carcinomas and accounts for about 70% of all lung 
cancer cases. Adenocarcinomas typically arise in thcperipheral airways and often form mucin 
secreting glands. Squamous ceflcarcinonoas typically arise in proximal airways. The histogenesis of 
squamous cell carcinomas may be related to chronic inflammation and injury to the bronchial 
epithelium, leading to squamous metaplasia. The Small Cell Lung Carcinoma (SCLC) group 
accounts for about 20% of lung cancer cases. SCLCs typically arise in proximal airways and exhibit 
a number of paraneoplastic syndromes including inappropriate production of adrenocorticotropin and 
anti-diuretic hormone. 

Lung cancer cells accumulate numerous genetic lesions, many of which are associated with 
cytologically visible chronoosomal aberrations. The high frequency of chromosomal deletions 
associated with lung cancer may reflect the role of multiple tumor suppressor loci in the etiology of 
this disease. Deletion of the short arm of chromosome 3 is found in ovct 90% of cases and r^resents 
one of the earliest genetic lesions leading to lung cancer. Deletions at chromosome arms 9p and 17p 
are also common. Other firequenfly observed genetic lesions include overexpression of telomerase, 
activation of oncogenes such as K-ras and c-mgrc, and inactivation of tunoor suppressor genes such as 
RB,p53andCDKN2. 

Genes differentially regulated in lung cancer have been identified by a variety of noethods. 
Using mRNA differential display technology, Manda et aL (1999; Genomics 51:5-14) identified five 
genes difiFerentially expressed in lung cancer cell lines compared to normal bronchial epithelial cells. 
Among the known genes, pulmonary surfactant apoprotein A and alpha 2 macroglobulin were down 
regulated whereas nnl23Hl was upregulated. Petersen et al. (2000; Int J. Cancer, 86:512-517) used 
suppression subtractive hybridization to identify 552 clones differentially expressed in lung tumor 
derived cell lines, 205 of which represmted known genes. Among the known genes, 
thrombospondin-1, fibronectin, intercellular adhesion molecule 1, and cytokeratins 6 and 18 were 
previously observed to be differentially expressed in lung cancers. Wang et al. (2000; Oncogene 
19:1519-1528) used a combination of microarray analysis and subtractive hybridization to identify 17 
graes differentially overexpresssed in squamous cell carcinoma compared with normal lung 
epithelium. Among the known genes they identified were keratin isoform 6, KOC, SPRC, IGFb2, 



19 



wo 2004/098539 



PCT/US2004/009215 



cooneKin 26, plakofQlin 1 and cytokeratin 13. 
Ovarian Cancer 

Ovarian cancer is tlie leading cause of death from a gynecolo^c cancer. The majority of 
ovarian cancers are derived from epithelial cells, and 70% of patients with epithelial ovarian cancers 
5 present with late-stage disease. As a result, the long-term survival rate for this disease is very low. 
Identification of early-stage markers for ovarian cancer would significantly increase the survival rate. 
Genetic variations involved in ovarian cancer development include mutation of p53 and microsatellite 
instability. Gene expression patterns Uk&ly vaiy when normal ovaiy is conpared to ovarian tumors. 
Prostate Cancer 

10 As with most tumors, prostate cancer develops through a multistage progression ultimately 

resulting in an aggressive tumor phenotype. The initial step in tumor progression involves the 
hyperproliferation of normal luminal and/or basal epithelial cells. Androgen responsive cells become 
hyperplastic and evolve into early-stage tumors. Although early-stage tumors are often androgen 
sensitive and respond to androgen ablation, a population of androgen independent cells evolve from 

15 the hyperplastic populatio3Du These cells represent a more advanced form of prostate tumor that may 
become invasive and potentially become metastatic to the bone, brain, or lung. A variety of genes 
may be differentially expressed during tumor progression. For exan^le, loss of heterozygosity 
(LOH) is frequently observed on chromosome 8p in prostate cancer. Fluorescence in situ 
hybridization (FISH) revealed a deletion for at least 1 locus on 8p in 29 (69%) tumors, with a 

20 significantly higher frequency of the deletion on 8p21.2-p21.1 in advanced prostate cancer than in 
localized prostate cancer, implymg that deletions on 8p22-p21.3 play an uqportant role in tumor 
differentiation, while 8p21.2-p21.1 deletion plays a role in progression of prostate cancer (Oba, K. et 
al. (2001) Cancer Genet. Cytogenet 124: 20-26). 

PZ-HPV-7 was derived from epithelial cells cultured from normal tissue from the peripheral 

25 zone of the prostate. The cells were transformed by transfection with HPV18. Imraunocytochemical 
analysis showed expression of keratins 5 and 8 and also the early region 6 (E6) oncoprotein of HPV. 
The cells are negative for prostate specific antigen (PSA), 

Interleukin 6 (IL-6) is a multifunctional protein that plays important roles in host defense, 
acute phase reactions, immune responses, and hematopoiesis. According to the tj^e of biological 

30 responses being studied, IL-6 was previously named interferon-b2, 26-kDa protein, B ceU stimulatory 
factor-2 (BSF-2), hybridoma/plasmacytoma growth factor, hepatocyte stimulating factor, cytotoxic T 
cell differentiation factor, and macrophage-granulocyte inducing factor 2A (MGI-2A). The IL-6 
designation was adopted after these variously named proteins were found to be identical on the basis 
of their amino acid and/or nucleotide sequences. IL-6 is expressed by a variety of normal and 

35 transformed ceUs including T cells, B cdls, monocytes/macrophages, fibroblasts, hepatocytes, 

20 



wo 2004/098539 



PCTAJS2004/009215 



keratinocytes, astrocytes, vascular endothelial cells, and various tumor cells. The production of ILr6 
is upregulated by numerous signals including nutogenic or antigenic stimulation, LPS, calcium 
ionophore, IL-1, IL-2, IFN, TNF, PDGF, and viruses. IL-4 and IL-13 inhibit DL-6 expression in 
monocytes. 
5 Obesity 

The most important function of adipose tissue is its ability to store and release fat during 
periods of feeding and fasting. White adipose tissue is the major energy reserve in periods of excess 
energy use. Its priniary puipose is mobilization during mergy deprivation. Understanding bow 
various molecules regulate adipositjr and energy balance in physiological and pathophysiological 

10 situations noay lead to the development of novel therapeutics for human obesity. Adipose tissue is 
also one of the inoyportant target tissues for insulin. Adipogenesis and insulin resistance in type II 
diabetes are liiiked and present intriguing relations. Most patimts with type n diabetes are obese and 
obesity in turn causes insulin resistance. 

The majority of research in adipocyte biology to date has been done using transformed mouse 

15 preadipocyte cell lines. The culture condition which stimulates mouse preadipocyte differentiation is 
different from that for inducing human primary preadipocyte differentiatioiL In addition, primary 
cells are diploid and may therefore reflect the in vivo context better than aneuploid cell lines. 
Understanding the gene expression profile during adipogenesis in humans will lead to an 
imderstanding of the fundamental mechanism of adiposity regulation. Furthermore, through 

20 comparing the gene expression profiles of adipogenesis between donors with normal weight and 
donors with obesity, identification of cmcial genes, potential drug targets for obesity and type n 
diabetes, win be possible. 

Thiazolidinediones (TZDs) act as agonists for the peroxisome-proliferator-activated receptor 
gamma (PPARy), a menoiber of the nuclear hormone receptor superfamily. TZDs reduce 

25 hyperglycemia, hyperinsulinemia, and hypertension, in part by promoting glucose metabolism and 
inhibiting gluconeogenesis. Roles for PPARy and its agonists have been demonstrated in a wide 
range of pathological conditions including diabetes, obesity, hypertension, atherosclerosis, polycystic 
ovarian syndrome, and cancers such as breast, prostate, liposarcoma, and colon cancer. 

The mechanism by which TZDs and other PPARy agonists enhance insulin sensitivity is not 

30 fully understood, but may involve the ability of PPARy to promote adipogenesis. When ectopically 
expressed in cultured preadipocytes, PPARy is a potent inducer of adipocyte differentiation. TZDs, 
in cordbination with insulin and other factors, can also enhance differentiation of human 
preadipocytes in culture (Adams et al. (1997) J. Clin. Invest. 100:3149-3153). The relative potency 
of different TZDs in promoting adipogenesis in vitro is proportional to both their insiilin sensitizing 

35 effects in v/vo, and their ability to bind and activate PPARy in vitro. Interestingly, adipocytes derived 

21 



wo 2004/098539 PCT/US2004/009215 

from omental adipose depots are refractory to the effects of TZDs, It has therefore been suggested 
that the insulin sensitizing effects of TZDs may result from their ability to promote adipogenesis in 
subcutaneous adipose depots (Adams et al., supra). Further, dominant negative mutations in the 
PPARy gene have been identified in two non-obese subjects with severe insulin resistance, 
5 hypertension, and overt non-insulin dependent diabetes meDitus (NIDDM) (Barroso et al. (1998) 
Nature 402:880-883). 

NIDDM is the most common form of diabetes meHitus, a chronic metabolic disease that 
affects 143 million people worldwide. NIDDM is characterized by abnormal ghicose and lipid 
noetabolism that results from a combination of peripheral insulin resistance and defective iDSuUn 

10 secretion. NIDDM has a complex, progressive etiology and a high degree of heritability. Numerous 
complications of diabetes including heart disease, stroke, renal failure, retinopathy, and peripheral 
neuropathy contribute to the high rate of morbidity and mortality. 

At the molecular level, PPARy functions as a ligand activated transcription factor. In the 
presence of ligand, PPARy f omos a heterodimer with the retinoid X recqptor (RXR) which then 

IS activates transcription of target genes containing one or muore copies of a PPARy response element 
(PPRE). Many genes important in lipid storage and metabolism contain PPREs and have been 
identified as PPARy targets, including PEPCK, aP2, LPL, ACS, and FAT-P (Auwerx, J. (1999) 
Diabetologia 42:1033-1049). Multiple ligands for PPARy have been identified. These include a 
variety of fatty acid metabolites; synthetic dmgs belonging to the TZD class, such as Pioglitazone and 

20 Rosiglitazone (BRL49653); and obtain non-glitazone tyrosine analogs such as GI262S70 and 
GW1929. The prostaglandin derivative 15-dPGJ2 is a potent endogenous ligand for PPARy. 

Expression of PPARy is very high in adipose but barely detectable in skeletal muscle, the 
* primary site for insulin stumlated ghicose disposal in the body. PPARy is also moderately expressed 
in large intestine, kidney, liver, vascular snoooth muscle, hematopoietic cells, and macrophages. The 

25 high expression of PPARy in adipose tissue suggests that the insulin sensitizing effects of T2X)s may 
result from alterations in the expression of one or more PPARy regulated genes in adipose tissue. 
Identification of PPARy target genes will contribute to better dmg design and the development of 
novel therapeutic strategies for diabetes, obesily, and other conditions. 

Systematic atten^ts to identify PPARy target genes have been noade in several rodent models 

30 of obesity and diabetes (Suzuki et al, (2000) Jpn. J. Pharmacol. 84:1 13-123; Way et al. (2001) 

Endocrinology 142:1269-1277). However, a serious drawback of the rodent gene expression studies 
is that significant diffidences exist between human and rodent models of adipogenesis, diabetes, and 
obesity (Taylor (1999) Cell 97:9-12; Gregoke et al. (1998) Physiol. Reviews 78:783-809). Therefore, 
an unbiased approach to identifying TZD regulated genes in primary cultures of human tissues is 

35 necessary to fully elucidate the molecular basis for diseases associated with PPARy activity. 



22 



wo 2004/098539 



PCTAJS2004/009215 



There is a need in the art for new compositions, including nucleic acids and protdns, for the 
diagnosis, prevention, and treatment of cardiovascular diseases, immune system disorders, 
neurological disord^, disorders affecting growth and developnoent, lipid disorders, cell proliferative 
5 disorders, and cancers. 

SUMMARY OF THE INVENTION 

Various cnabodiTnents of the invention provide purified polypeptides, kinases and 
phosphatases, referred to collectively as *KPP' and individually as *KPP-1,* *KPP-2,' 'KPF-S,* *KPP- 

10 4,* 'KPP-S,' *KPP-6,* *KPP-7,' 'RPP-S,' 'RPP-P,' *KPP-10,* 'KPP-11,* 'KPP-12,* *KPP-13,' *KPP- 
14,' *KPP.15,' *KPP.16,' *KPP-17,' *KPP-18,* *KPP-19,* 'KPP-20,' *KPP-21,' *KPP-22,* 'KPP-23,' 
*KPP-24,* *KPP-25,* *KPP-26,' 'KPP-27,* 'KPP-28,* *KPP-29,' *KPP-30,' 'RPP-Bl,' *KPP-32,' 
*KPP-33r *KPP-34,* 'KPP-35,* *KPP-36,' *KPP.37,* *KPP-38,' *KPP-.39.* •KPP-40,' *KPP-41,' 
'KPP-42,' and 'KPP-43' and methods for using these proteins and their encoding polynucleotides for 

15 the detection, diagnosis, and treatment of diseases and medical conditions. Ecdbodiments also 
provide methods for utilizing the purified kinases and phosphatases and/or their encoding 
polynucleotides for facilitating the drug discovery process, including determination of efficacy, 
dosage, toxicity, and pharmacology. Related embodiments provide noethods for utilizing the purified 
kinases and phosphatases and/or thdr encoding polynucleotides for investigatiug the pathogenesis of 

20 diseases and medical conditions. * 

An enibodiment provides an isolated polyp^tide selected from the group consisting of a) a 
polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NO:l- 
43, b) a polypeptide conprising a naturally occurring amino acid sequence at least 90% identical or at 
least about 90% identical to an anuno acid sequence selected from the group consisting of SEQ ID 

25 NO:l-43, c) a biologically active fragment of a polypeptide having an anoino acid sequence selected 
from the group consisting of SEQ ID NO: 1-43, and d) an inomunogenic fragment of a polypeptide 
having an ammo acid sequence selected from the group consisting of SEQ ID NO:l-43. Another 
endbodiment provides an isolated polypeptide comprising an amino acid sequence of SEQ ID 
NO:l-43. 

30 Stin another erdbodiment provides an isolated polynucleotide encoding a polypeptide 

selected from the group consisting of a) a polypeptide comprising an arrdno acid sequence selected 
from the group consisting of SEQ ID NO:l-43, b) a polypeptide comprising a naturally occurring 
amino acid sequence at least 90% identical or at least about 90% identical to an amino acid sequence 
selected from the group consisting of SEQ ID NO:l-43, c) a biologically active fragment of a 

35 polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO:l-43, 

23 



wo 2004/098539 



PCT/US2004/009215 



and d) an immunogenic fragment of a polypeptide having an amino acid sequence selected from the 
group consisting of SEQ ID NO: 1-43. In another embodiment, the polynucleotide encodes a 
polypeptide selected from the group consisting of SEQ ID NO:l-43. In an alternative embodiment, 
the polynucleotide is selected from the group consisting of SEQ ID NO:44-86. 
5 Still another embodinoent provides a recombinant polynucleotide conq)rising a promoter 

sequence operahly linked to a polynucleotide encoding a polypeptide selected from the group 
consisting of a) a polypeptide comprising an annno acid sequence selected from the group consisting 
of SEQ ID NO:l-43, b) a polypeptide con^rising a naturally occurring amino acid sequence at least 
90% identical or at least about 90% identical to an ammo acid sequence selected from the group 

10 consisting of SEQ ID NO: 1-43, c) a biologically active fragment of a polypeptide having an ammo 
acid sequence selected from the group consisting of SEQ ID NO:l-43, and d) an immunogenic 
fragment of a polypqptide having an anoino add sequence selected from the group consisting of SEQ 
ID NO:l-43. Anoth^ enibodimeDt provides a cell transformed with the recombinant polynucleotide. 
Yet anoth^ eabodimsnt provides a transgenic organism con^rising the recombinant polynucleotide. 

15 Another embodiment provides a method for producing a polypeptide selected from the group 

consisting of a) a polypeptide conq)rising an amino acid sequence selected from the group consisting 
of SEQ ID NO: 1-43, b) a polypeptide comprising a naturally occurring ammo acid sequence at least 
90% identical or at least about 90% identical to an amino acid sequence selected from the group 
consisting of SEQ ID NO:l-43, c) a biologically active fragment of a polypeptide having an amino 

20 acid sequence selected from the group consisting of SEQ ID NO: 1 -43, and d) an immunogenic 

fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ 
ID NO: 1-43. The method conoprises a) culturing a cell under conditions suitable for expression of the 
polypeptide, wh^dn said ceU is transformed with a recorrfeinant polynucleotide comprising a 
promoter sequence operably linked to a polynucleotide encoding the polypeptide, and b) recovering 

25 the polypeptide so expressed. 

Yet anoth^ enft>odinaent provides an isolated antibody which specifically binds to a 
polypeptide selected from the group consisting of a) a polypeptide comprising an amino acid 
sequence selected from the group consisting of SEQ ID NO:l-43, b) a polypeptide comprising a 
naturally occurring amino acid sequence at least 90% identical or at least about 90% identical to an 

30 amino acid sequence selected from the group consisting of SEQ ID NO:l-43, c) a biologically active 
fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ 
ID NO:l-43, and d) an immunogenic fragment of a polypeptide having an amino acid sequence 
selected from the group consisting of SEQ ID NO:l-43. 

Still yet another embodiment provides an isolated polynucleotide selected from the group 

35 consisting of a) a polynucleotide comprising a polynucleotide sequence selected from the group 

24 



wo 2004/098539 



PCT/US2004/009215 



consisting of SEQ ID NO:44-86, b) a polynucleotide comprising a naturaUy occuixing polynucleotide 
sequence at least 90% identical or at least about 90% identical to a polynucleotide sequence selected 
from the group consisting of SEQ ID NO:44-86, c) a polynucleotide complementary to the 
polynucleotide of a), d) a polynucleotide complementary to the polynucleotide of b), and e) an RNA 
equivalent of a)-d). In other embodiments, the polynucleotide can con^rise at least about 20, 30, 40, 
60, 80, or 100 contiguous micleotides. 

Yet another enabodiment pro\ddes a method for detecting a target polynucleotide in a sample, 
said target polynucleotide being selected from the group consisting of a) a polynucleotide comprising 
a polynucleotide sequence selected from the group consisting of SEQ ID NO:44-86, b) a 
polynucleotide conoprising a naturally occurring polynucleotide sequence at least 90% identical or at 
least about 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID 
NO:44-86, c) a polynucleotide complenaentary to flie polynucleotide of a), d) a polynucleotide 
con5)lementary to the polynucleotide of b), and e) an RNA equivalent of a)-^). The method 
comprises a) hybridizing the sample with a probe comprising at least 20 contiguous nucleotides 
concprising a sequence complementary to said target polynucleotide in the san^jle, and whicb probe 
specifically hybridizes to said target potynucleotide, under conditions whereby a hybridization 
complex is formed between said probe and said target polynucleotide or fragments thereof, and b) 
detecting the presence or absence of said hybridization cosi^l&x. In a related embodiment, the 
method can include detecting the amount of the hybridization con5)lex. In still other embodiments, 
the probe can comprise at least about 20, 30, 40, 60, 80, or 100 contiguous nucleotides. 

Still yet another embodiment provides a method for detecting a target polynucleotide in a 
sample, said target polynucleotide being selected from the group consisting of a) a polynucleotide 
con^rising a polynucleotide sequence selected from the group consisting of SEQ ID NO:44-86, b) a 
polynucleotide cornprising a naturally occurring polynucleotide sequence at least 90% identical or at 
least about 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID 
NO:44-86, c) a polynucleotide con5)lementaiy to the polynucleotide of a), d) a polynucleotide 
complementary to the polynucleotide of b), and e) an RNA equivalent of a)-d). The method 
comprises a) amplifying said target polynucleotide or fragment thereof \ising polymerase chain 
reaction amplification, and b) detecting the presence or absence of said amplified target 
polynucleotide or fragment thereof. In a related embodiment, the method can include detecting the 
amount of the amplified target polynucleotide or fragment thereof. 

Another embodiment provides a composition comprising an effective amount of a 
polypeptide selected from the group consisting of a) a polypeptide comprising an amino acid 
sequence selected from the group consisting of SEQ ID NO: 1-43, b) a polypeptide comprising a 
naturaUy occurring amino acid sequence at least 90% identical or at least about 90% identical to an 



25 



wo 2004/098539 



PCT/US2004/009215 



aniino acid sequence selected firomthe group consisting of SEQ ID NO: 1-43, c) a biologicaDy active 
fragment of a polypeptide having an ancdno acid sequence selected from the group consisting of SEQ 
ID NO:l-43, and d) an immunogenic fragment of a polypeptide liaving an amino acid sequence 
selected from the group consisting of SEQ ID NO:l-43, and a pharmaceutically acceptable excipient 

5 In one endsodintient, the composition can coii:gpiise an amino acid sequence sheeted from the group 
consisting of SEQ ED NO:l-43. Other enibodiments provide a method of treating a disease or 
condition assodated with decreased or abnormal expression of functional KPP, ccnrprising 
administering to a patient in need of such treatinsnt the conoposition. 

Yet another embodinoent provides a noyethod for screening a conqiound for effectiveness as an 

10 agonist of a polypeptide selected from the group consisting of a) a polypeptide conq)rising an amino 
acid sequence selected from the group consisting of SEQ ID NO:l-43, b) a polypeptide comprising a 
naturally occurring amino acid sequence at least 90% identical or at least about 90% identical to an 
anmno acid sequence selected from the group consisting of SEQ ID NO:l-43, c) a biolo^caHy active 
fragment of a polypeptide having an anuno acid sequence selected from the group consisting of SEQ 

15 ID NO: 1-43, and d) an immunogenic fragpa&nt of a polypeptide having an ammo acid sequence 
selected from the group consisting of SEQ ID NO:l-43. The method comprises a) contacting a 
sample con^sing the polypeptide with a compound, and b) detecting agonist activity in the sample. 
Another enibodinaent provides a composition con^rising an agonist conopound identified by the 
muethod and a pharmaceutically acceptable excipient Yet another embodiment provides a method of 

20 treating a disease or condition associated with decreased expression of functional KPP, con^rising 
administering to a patient in need of such treatment the composition. 

Still yet another embodiment provides a method for screening a compoimd for effectiveness 
as an antagonist of a polypeptide selected from the group consisting of a) a polypeptide conq)rising an 
amino acid sequence selected from the group consisting of SEQ ID NO: 1-43, b) a polypeptide 

25 comprising a naturally occurring amino acid sequence at least 90% identical or at least about 90% 
identical to an amino acid sequence selected from the group consisting of SEQ ID NO:l-43, c) a 
biologically active fragment of a polypeptide having an amino acid sequence selected from the group 
consisting of SEQ ID NO: 1-43, and d) an immunogenic fragment of a polypeptide having an an^o 
acid sequence selected from the group consisting of SEQ ID NO:l-43. The method comprises a) 

30 contacting a sartple comprising the polypeptide with a compoimd, and b) detecting antagonist activity 
in the sample. Another errbodiment provides a composition comprising an antagonist compound 
identified by the method and a pharmaceutically acceptable excipient. Yet another embodiment 
provides a method of treating a disease or condition associated with overexpression of functional 
KPP, con[5)rising administering to a patient in need of such treatment the composition. 

35 Another embodiment provides a method of screening for a conopound that specifically binds 

26 



wo 2004/098539 PCT/US2004/009215 

to a polypeptide selected from the group consisting of a) a polyp^tide conqjrising an amino acid 
sequence selected from the group consisting of SEQ ID NO:l-43, b) a polypeptide con5)rising a 
naturally occurring amino acid sequence at least 90% identical or at least about 90% identical to an 
anmno acid sequence selected from the group consisting of SEQ ID NO: 1-43, c) a biologically active 
fragKient of a polypeptide having an amiao acid sequence selected from the group consisting of SEQ 
ID NO:l-43, and d) an immunogenic fragment of a polypeptide having an ammo acid sequence 
selected from the group consistmg of SEQ ID NO: 1-43. The method coiEpiises a) combining the 
polypeptide with at least one test compound under suitable conditions, and b) detecting binding of the 
polypeptide to the test compound, thereby identifying a compound that specifically binds to the 
polypeptide. 

Yet another enoibodiment provides a method of screening for a compound that modulates the 
activity of a polypeptide selected from the group consisting of a) a polypeptide con^prising an amino 
acid sequeaice selected from the group consisting of SEQ ID NO:l-43, b) a polypeptide conyrising a 
naturally occurring amino acid sequence at least 90% identical or at least about 90% identical to an 
ammo acid sequence selected from the group consisting of SEQ ID NO:l-43, c) a biologically active 
fragmait of a polypeptide having an ammo add sequence selected from the group consisting of SEQ 
ID NO:l-43, and d) an immunogenic fragment of a polypeptide having an amino add sequence 
selected from the group consisting of SEQ ID NO:l-43. The method coic5)rises a) combining the 
polypeptide with at least one test compound under conditions permissive for the activity of the 
polypeptide, b) assessing the activity of the polypeptide in the presence of the test compound; and c) 
con5)aring the activity of the polypeptide in flie presence of the test compound with the activity of the 
polypqptide in the absence of the test compound, wherein a change in the activity of the polypeptide 
in the presence of the test compound is indicative of a confound that modulates the activity of the 
polypeptide. 

Still yet another embodiment provides a method for screening a conq)ound for effectiveness 
in altering expression of a target polynucleotide, wherdn said target polynucleotide con9)rises a 
polynucleotide sequence selected from the group consisting of SEQ ID NO:44-86, the method 
conipxising a) contacting a sample comprising the target polynucleotide with a conqiound, b) 
detecting altered expression of the target polynucleotide, and c) comparing the expression of the 
target polynucleotide in the presence of varying amounts of the compound and in the absence of the 
conopound. 

Anothw embodiment provides a method for assessing toxicity of a test compound, said 
method comprising a) treating a biological sample contaming nucleic acids with the test conq)ound; 
b) hybridizing the nucleic acids of the treated biological san5)le with a probe corr^rising at least 20 
contiguous nucleotides of a polynucleotide selected from the group consisting of i) a polynucleotide 



27 



wo 2004/098539 



PCT/US2004/009215 



coii5)rismg a polyaucleotide sequence selected from the group consisting of SEQ ID NO:44-86, ii) a 
polynucleotide conq)rising a naturally occurring polynucleotide sequence at least 90% identical or at 
least about 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID 
NO:44-86, iii) a polynucleotide having a seqnence con5)lemfintary to i), iv) a polynucleotide 
5 complementary to the polynucleotide of ii), and v) an RNA equivalent of i)-iv). Hybridization occurs 
imder conditions whereby a specific hybridization complex is formed between said probe and a target 
polynucleotide in the biological sample, said target polynucleotide selected from the group consisting 
of i) a polynucleotide comprising a polynucleotide sequence selected from the group consisting of 
SEQ ID NO:44-86, ii) a polynucleotide comprising a naturally occurring polynucleotide sequence at 

10 least 90% identical or at least about 90% identical to a polynucleotide sequence selected from the 
group consisting of SEQ ID NO:44-86, iii) a polynucleotide complementaiy to the polynucleotide of 
i), iv) a polynucleotide complementary to the polynucleotide of ii), and v) an RNA equivalent of i)- 
iv). Alternatively, the target polynucleotide can course a fragment of a polynucleotide selected 
from flie group consisting of i)-v) above; c) quantifying the amount of hybridization complex; and d) 

15 comparing the amount of hybridization complex in the treated biological san^le with the amount of 
hybridization complex in an untreated biological sanople, wherein a difference in the amount of 
hybridization complex in the treated biological sample is indicative of toxicity of the test compound, 

BRIEF DESCRIPTION OF THE TABLES 
20 Table 1 summarizes the nomenclature for AiU length polynucleotide and polypeptide 

embodiments of the invention. 

Table 2 shows the GeoBank identification number and annotation of the nearest GenBank 
homolog, and the PROTEOME database identification numbers and annotations of PROTEOME 
database homologs, for polypeptide embodiment of the invention. The probability scores for the 
25 matches between each polypeptide and its homolog(s) are also showa 

Table 3 shows structural features of polypeptide embodiments, including predicted motifs 
and domains, along with the methods, algorithms, and searchable databases used for analysis of the 
polypeptides. 

Table 4 lists the cDNA and/or genomic DNA fragments which were used to assemble 
30 polynucleotide enibodiments, along with selected fragments of the polynucleotides. 

Table 5 shows representative cDNA libraries for polynucleotide embodiments. 

Table 6 provides an appendix which describes the tissues and vectors used for construction of 
the cDNA libraries shown in Table 5. 

Table 7 shows the tools, programs, and algorithms used to analyze polynucleotides and 
35 polypeptides, along with applicable descriptions, references, and threshold parameters. 

28 



wo 2004/098539 



PCT/US2004/009215 



Table 8 shows single nucleotide polymorplusms found in polynucleotide sequences of the 
invention, along with allele firequencies in differ^ human populations. 

DESCRIPTION OF THE INVENTION 

5 Befoie the present proteins, nucleic acids, and noethods are described, it is understood that 

embodinoents of the invention are not United to the particular machines, instnun^its, materials, and 
methods described, as these may vary. It is also to be und^stood that the terminology used herein is 
for the purpose of describing particular embodiments only, and is not intended to limit the scope of 
the invention. 

10 As used herein and in the appmded claims, the singular forms "a," "an," and "the" include 

plural reference tmless the context clearly dictates otherwise. Thus, for example, a reference to "a 
host cell" includes a plurality of such host cells, and a reference to "an antibody" is a reference to one 
or more antibodies and equivalents thereof known to those skOled in the art, and so forth. 

Unless defined otherwise, all technical and scioitific terms used herein have the same 

15 meanings as commonly understood by one of ordinary skill in the art to which this invention belongs. 
Although any machines, materials, and methods sinoolar or equivalent to those described hordn can be 
used to practice or test the present invention, the preferred machines, materials and methods are now 
desoribed. All publications mentioned herdn are cited for the purpose of describing and disclosing 
the cell lines, protocols, reagents and vectors which are reported in the publications and which might 

20 . be used in connection with various ernbodincients of the invention. Nothing herein is to be construed 
as an adnaission that the invention is not entitied to antedate such disclosure by virtue of prior 
invention; 
DEFINITIONS 

"KPP" refers to the amino acid sequences of substantially purified KPP obtained from any 
25 species, particularly a mammalian species, including bpvine, ovine, porcine, murine, equine, and . 
human, and from any source, whether natural, synthetic, senai-synthetic, or recombinant 

The term "agonist" refers to a molecule which int^isifies or mindcs the biological activity of 
KPP. Agonists may include proteins, nucleic acids, carbohydrates, small molecules, or any othCT 
compound or composition which modulates the activity of KPP either by direcfly interacting with 
30 KPP or by acting on components of the biological pathway in which KPP participates. 

An "allelic variant" is an alternative form of the gene encoding KPP. Allelic variants may 
result from at least one mutation in the nucleic acid sequence and may result in altered mRNAs or in 
polypeptides whose structure or function may or may not be altered. A gene may have none, one, or 
naany allelic variants of its naturally occurring form. Common mutational changes which give rise to 
35 allelic variants are generally ascribed to natural deletions, additions, or substitutions of nucleotides. 



29 



wo 2004/098539 PCT/US2004/009215 

Each of these types of changes may occur alone, or in cornbination with the others, one or niore times 
in a given sequeace. 

"Altered" nucleic acid sequences encoding KPP include those sequences with deletions, 
insertions, or substitutions of different nucleotides, resulting in a polypeptide the same as KPP or a 
5 polypeptide with at least one functional characteristic of KPP. Included within this dejBnition are 
polymorphisms which may or may not be readily detectable using a particular oligonucleotide probe 
of the polynucleotide encoding KPP, and in^roper or unexpected hybridization to allehc variants, 
with a locus other than the normal chromosomal locus for the polynucleotide encoding KPP. The 
encoded protein may also be "altered," and may contain deletions, insertions, or substitutions of 

10 amino acid residues which produce a silent change and result in a functionally equivalent KPP. 

Deliberate amino acid substitutions may be made on the basis of one or more simHarities in polarity, 
charge, solubility, hydrophobicity, hydrophilicity, and/or the amphipathic nature of the residues, as 
long as the biological or immunological activity of KPP is retained. For exan^le, negatively charged 
amino acids may include aspartic acid and glutamic adid, and positively charged anuno acids may 

IS include lysine and arginine. Anmiio acids with undiarged polar side chains having sinnlar 

hydrophilicity values may include: asparagine and gilutamine; and serine and threonine. Amino acids 
with uncharged side chains having sincdlar hydrophilicity values may include: leucine, isoleucine, and 
valine; glycine and alanine; and phenylalanine and tjrrosine. 

The terms "amino acid" and "anrnio acid sequence" can refer to an oligopeptide, a peptide, a 

20 polypeptide, or a protdba sequence, or a fragment of any of these, and to naturally occurring or 
synthetic molecules. Where "amino acid sequence" is recited to refer to a sequence of a naturally 
occurring protein molecule, "amino acid sequence" and like terms are not meant to limit the amino 
acid sequence to the complete native amino acid sequence associated with the recited protein 
molecule. 

25 "Amplification" relates to the production of additional copies of a nucldic acid. 

An^lification may be carried out using polymerase chain reaction (PGR) technologies or other 

nucleic acid anplification technologies well known in the art. 

The t^cm "antagonist" refers to a noolecule which inhibits or attenuates the biological activity 

of KPP. Antagonists may include protdns such as antibodies, anticalins, nucleic acids, 
30 carbohydrates, small molecules, or any other compound or conoposition which modulates the activity 

of KPP either by directly interacting with KPP or by acting on components of the biological pathway 

in which KPP participates. 

The term "antibody" refers to intact immunoglobulin molecules as well as to fragments 

thereof, such as Fab, F(ab')2, and Pv fragments, which are capable of binding an epitopic determinant 
35 Antibodies that bind KPP polypeptides can be prepared using intact polypeptides or using fragments 



30 



wo 2004/098539 



PCT/US2004/009215 



containing small peptide of interest as the immunizing antigen. The polypeptide or oligopeptide 
used to immunize an animal (e.g., a mouse, a rat, or a rabbit) can be derived from the translation of 
RNA, or synthesized chemically, and can be conjugated to a carrier protein if desired. Commonly 
used carriers that are chemically coupled to p^tides include bovine serum albiunin, thyroglobulin, 
5 and keyhole linqjethemocyaninCKLI^. The coupled peptide is then used to immunize the animal 
The term "antigenic determinant" refers to that region of a molecule (Le., an epitope) that 
makes contact with a particular antibody. When a protdn or a fragment of a protein is used to 
immunize a host animal, numerous regions of the protem may induce the production of antibodies 
which bind specifically to antigenic determinants (particular regions or three-dimensional structures 
10 on the protdn). An antigenic determinant may cowpGti& with the intact antigen (i,e., the immunogen 
used to elicit the immune response) for binding to an antibody. 

The term "aptamer" refers to a nucleic acid or oligonucleotide molecule that binds to a 
specific molecular target Aptamers are derived from an in vitro evolutionary process (e.g., SELEX 
(Systematic Evolution of Ugands by Exponential Enrichment), described in U.S. Patent No. 
15 5,270, 1 63), vMch selects for target-specific aptamer sequences from large confl^inatorial libraries. 
Aptamer con5)ositions maybe double-stranded or single-stranded, and may include 
deoxyribonucleotides, ribonucleotides, nucleotide derivatives, or other nucleotide-like molecules. 
The nucleotide components of an aptan^r may have modified sugar groups (e.g., the 2 -OH group of a 
ribonucleotide may be replaced by 2*-F or T-NH^, which may improve a desired property, e.g., 
20 resistance to nucleases or longer lifetime in blood. Aptamers may be conjugated to other molecules, 
e.g., a high molecular weight carrier to slow clearance of the aptamer from the circulatory system. 
Aptamers may be specifically cross-linked to fhdr cognate ligands, e.g., by photo-activation of a 
cross-lmker (Brody, E.N. and L. Gold (2000) J. Biotechnol. 74:5-13). 

The term "mtramer" refers to an aptamer which is expressed in vivo. For exan5)le, a vaccinia 
25 virus-based RNA expression systemhas been used to express specific RNA aptamers at high levels in 
the cytoplasm of leukocytes (Blind, M. et al. (1999) Proc. Nafl. Acad. Sci. USA 96:3606-3610). 

The term "spiegehner" refers to an aptamer which includes L-DNA, L-RNA, or other left- 
handed nucleotide derivatives or nucleotide-like noolecules. Aptamers containing left-handed 
nucleotides are resistant to degradation by naturally occurring enzymes, which normally act on 
30 substrates containing right-handed nucleotides. 

The term "antisense" refers to any conq)osition capable of base-pairing with the "sense" 
(coding) strand of a polynucleotide having a specific nucleic acid sequence. Antisense con^jositions 
may include DNA; RNA; peptide nucleic acid (PNA); oligonucleotides having modified backbone 
linkages such as phosphorothioates, methylphosphonates, or benzylphosphonates; oligonucleotides 
35 having modified sugar groups such as 2 -mefhoxyethyl sugars or 2'-mefhoxyefhoxy sugars; or 

31 



wo 2004/098539 



PCTAJS2004/009215 



oligonacleotides havmg modified bases such as 5-metliyl cytosine, 2'-deoxyiiracil, or 7-deaza-2 - 
deoxyguanosine. Andseiise molecules may be produced by any method mcludiiig chemical synfiiesis 
or transoiption. Once introduced into a cell, the con^)lementary antisense molecule base-pairs with a 
naturally occurring nucleic acid sequence produced by the cell to form duplexes which block either 
5 transcription or translation. The designation "negative" or "minus" can refer to the antis^e strand, 
and the designation "positive" or "plus" can refor to the sense strand of a reference DNA molecule. 

The term "biologically active" refers to a protein having structural, regulatory, or biochemical 
functions of a naturally occurring molecule. likewise, "immunologically active" or "immunogenic" 
refers to the capability of the natural, recond^inant, or synthetic KPP, or of any oligopeptide thereof, 
10 to induce a specific immune response in appropriate animals or cells and to bind with specific 
antibodies. 

"Con5>lemfintary*' describes the relationship between two single-stranded nucleic acid 
sequences that anneal by base-pairing. For exan^)le, 5'-AGT-3' pairs with its con^lement, 
3'-TCA-5'. 

15 A "conq)Osition conprising a given polynucleotide" and a "composition coitiprising a given 

polypq)tide" can refer to any conoposition containing the given polynucleotide or polypeptide. The 
con^osilion may conqirise a dry formulation or an aqueous solution. Compositions conq>rising 
polynucleotides encoding KPP or fragments of KPP may be employed as hybridization probes. The 
probes may be stored in fre&se-dried form and may be associated with a stabilizing agent such as a 

20 carbohydrate. In hybridizations, the probe may be deployed in an aqueous solution containing salts 
(e.g., NaCl), detergents (e.g., sodium dodecyl sulfate; SDS), and other conponents (e.g., Dehhardt's 
solution, dry milk, salmon sperm DNA, etc.). 

"Consensus sequence" refers to a nucleic acid sequmce which has been subjected to repeated 
DNA sequence analysis to resolve uncalled bases, extended using the XL-PCR kit (Applied 

25 Biosystems, Foster City CA) in the 5' and/or the 3* direction, and resequenced, or which has bem 

assembled from one or more overlapping cDNA, EST, or genomic DNA fragmmts using a computer 
program for fragment assembly, such as the GELVIEW fragment assembly systemXAccelrys, 
Burlington MA) or Phrap (University of Washington, Seatfle WA). Some sequences have been both 
extended and assembled to produce the consensus sequence. 

30 "Conservative amino acid substitutions" are those substitutions that are predicted to least 

interfere with the properties of the original protein, i.e., the structure and especially the function of 
the protein is conserved and not significantly changed by such substitutions. The table below shows 
anoino acids which may be substituted for an original amino acid in a protein and which are regarded 
as conservative amino acid substitutions. 



32 



wo 2004/098539 PCT/US2004/009215 



Oiigiiial Residue 


Conservative Substitution 


Ala 


Gly, Ser 


Arg 


His, Lys 


Asn 


Asp, Gin, His 


Asp 


Asn, Glu 


Cys 


Ala, Ser 


Gin 


Asn, Glu, His 


Glu 


Asp, Gin, His 


GIy 


Ala 


His 


Asn, Arg, Gin, Glu 


He 


Leu, Val 




He, Val 


Lys 


Arg, Gin, Glu 


Met 


Leu, lie 


Phe 


His, Met, Leu, Tip, Tyr 


Ser 


Cys, Thr 


Thr 


Ser, Val 


Trp 


Phe, Tyr 


Tyr 


His, Phe, Trp 


Val 


He, Leu, Thr 



ConsCTvative anmio acid substitutions geaierally maintain (a) the structure of the polypeptide 
backbone in the area of the substitution, for example, as a beta sheet or alpha helical conformation, 
(b) the charge or hydrophobicity of the molecule at the site of the substitution, and/or (c) the bulk of 
the side chain. 

A "deletion" refers to a change in the amino acid or nucleotide sequence that results in the 
absence of one or more amino acid residues or nucleotides. 

The term "derivative" refers to a chemically modified polynucleotide or polypeptide. 
Chemical modifications of a polynucleotide can include, for exan5)le, replacement of hydrogen by an 
alkyl, acyl, hydroxyl, or amino group. A derivative polynucleotide encodes a polypeptide which 
retains at least one biological or immunological function of the natural molecule. A derivative 
polypeptide is one modified by glycosylation, pegylation, or any similar process that retains at least 
one biological or immunological ftmction of the polypeptide from which it was derived. 

A "detectable label" refers to a reporter molecule or enzyaie that is capable of generating a 
measurable signal and is covalenfly or noncovalently joined to a polynucleotide or polypeptide. 

"Differential expression" refers to increased or upregulated; or decreased, downregulated, or 
absent gene or protein expression, determined by comparing at least two different san5)les. Such 
comparisons may be carried out between, for example, a treated and an untreated san5)le, or a 
diseased and a normal sanqjle. 

"Exon shuffling" refers to the recombination of different coding regions (exons). Since an 
exon may represent a structural or functional domain of the encoded protein, new proteins may be 
assembled through the novel reassortment of stable substructures, thus aUowmg acceleration of the 



33 



wo 2004/098539 PCTAJS2004/009215 

evolution of new protein functions. 

A "jfragment" is a unique portion of KPP or a polynucleotide encoding KPP whicdi can be 
identical in sequence to, but shorter in length than, the parent sequence. A fragment may comprise up 
to the entire length of the defined sequence, nmnus one nucleotide/anmno acid residue. For example, a 

5 fragment may con5)rise from about 5 to about 1000 contiguous nucleotides or anmno acid residues. A 
fragment used as a probe, primer, antigen, therapeutic molecule, or for other purposes, may be at least 
5, 10, 15, 16, 20, 25, 30, 40, 50, 60, 75, 100, 150, 250 or at least 500 contiguous nucleotides or anrino 
acid residues in length. Fragments may be preferentially selected from certain regions of a molecule. 
For example, a polypeptide fragment naay comprise a certain length of contiguous amino acids 

10 selected from the first 250 or 500 amino acids (or first 25% or 50%) of a polypeptide as shown in a 
certain defined sequence. Clearly these lengths are exenqdary , and any length that is supported by 
the specification, including the Sequence Listing, tables, and figures, may be encon^assed by the 
present ^xibodim^iits. 

A fragment of SEQ ID NO:44-86 can con5>rise a region of unique polynucleotide sequence 
15 that spedficatty identifies SEQ ID NO:44-86, for exancqple, as distinct from any other sequence in the 
genome from which the fragment was obtained. A fragment of SEQ ID NO:44-86 can be employed 
in one or more embodiments of methods of the invention, for example, in. hybridization and 
aropMcation technolo^es and in analogous methods that distinguish SEQ ID NO:44-86 fromrelated 
polynucleotides. The precise length of a fragment of SEQ ID NO:44-86 and the region of SEQ ID 
20 NO:44-86 to which the fragment corresponds are routinely determinable by one of ordinary skill in 
the art based on the intended purpose for the fragment 

A fragment of SEQ ID NO:l-43 is encoded by a fragment of SEQ ID NO:44-86. A firagment 
of SEQ ID NO:l-43 can coinprise a region of unique anino acid sequence that specifically identifies 
SEQ ID NO:l-43. For example, a fragment of SEQ ID NO:l-43 can be used as an immunogenic 
25 peptide for the development of antibodies that specifically recognize SEQ ID NO:l-43. The precise 
length of a fragment of SEQ ID NO:l-43 and the region of SEQ ID NO:l-43 to.which the fragment 
corresponds can be determined based on the intended purpose for the fragment using one or more 
analytical methods described herein or otherwise known in the art. 

A "full length" polynucleotide is one containing at least a translation initiation codon (e.g., 
30 methionine) followed by an open readmg frame and a translation temination codon. A "fldl length" 
polynucleotide sequence encodes a "full length" polypeptide sequence. 

"Homology" refers to sequence similarity or, alternatively, sequence identity, between two or 
more polynucleotide sequences or two or more polypeptide sequences. 

The terms "percent identity" and "% identity," as applied to polynucleotide sequences, refer 
35 to the percentage of identical nucleotide matches between at least two polynucleotide sequences 



34 



wo 2004/098539 



PCT/US2004/009215 



aligned using a standardized algoriflmL Such an algorithm may insert, in a standardized and 
reproducible way, gaps in tlie sequences being compared in order to optinize aligmnsnt between two 
sequences, and therefore achieve a more meaningful comparison of the two sequences. 

Percent identity between polynucleotide sequences may be determined using one or more 
5 coniputer algorithms or programs known in the art or described herein. For example, percent identity 
can be detenrined using the default parameters of the CLUSTAL V algorithm as incorporated into 
the MEGALIGN version 3.12e sequence aUgnment program. This programis part of the 
LASERGENE software package, a suite of rnolecular biological analysis programs (DNASTAR, 
Madison WI). CLUSTAL V is described in Higgins, D.G. and P.M. Sharp (1989; CABIOS 5:151- 
10 153) and in Higgins, D.G. et al. (1992; CABIOS 8:189-191). For pairwise aUgnments of 
polynucleotide sequences, the default paramet^ are set as follows: Ktuple=2, gap penally=5, 
window=4, and "diagonals saved"=4. The "weighted" residue weight table is selected as the defeult. 

Alternatively, a suite of commocly used and freely available sequence conopaiison algorithms 
which can be used is provided by the National Center for Biotechnology Information QiCBT) Basic 
15 Local AUgnment Search Tool (BLAST) (Altschul, S.F. et al. (1990) J. Mol. Biol. 215:403-410), 
which is available from several sources, including the NCBI, Bethesda, MD, and on the Intemet at 
ncbLhhanih.gov/BLAST/. The BLAST software suite includes various sequence analysis progranas 
mcludmg *1)lastn," that is used to aUgn a known polynucleotide sequence with other polynucleotide 
sequences from a variety of databases. Also available is a tool called "BLAST 2 Sequences" that is 
20 used for direct pairwise conotparison of two nucleotide sequences. "BLAST 2 Sequences" can be 
accessed and used mteractively at ncbi.hlmnih.gov/gorfa>12.html. The "BLAST 2 Sequences" tool 
can be used for both blastn and blastp (discussed below). BLAST programs are compaorily used with 
gap and other parameters set to default settings. For oxamplc, to compare two nucleotide sequences, 
one may use blastn with the "BLAST 2 Sequences" tool Version 2.0.12 (April-21-2000) set at default 
25 parameters. Suchdefaultparametersmaybe, for example: 
Matrix: BLOSUM62 
Reward for match: 1 
Penalty for mismatch: -2 
Open Gap: 5 and Extension Gap: 2 penalties 
30 Gap X drop-off: 50 

Expect: 10 
Word Size: 11 
Filter: on 

Percent identity may be measured over the length of an entire defined sequence, for example, 
35 as defined by a particular SEQ ID nuniber, or may be measured over a shorter length, for example. 



35 



wo 2004/098539 PCTAJS2004/009215 

over the length of a fragment taken from a larger, defined sequence, for instance, a fragment of at 
least 20, at least 30, at least 40, at least 50, at least 70, at least 100, or at least 200 contiguous 
nucleotides. Such lengths are exemplary only, and it is understood that any fragment length 
supported by the sequences shown herein, in the tables, figures, or Sequence Listing, noay be used to 
5 describe a length over which percentage identity may be measured. 

Nucleic acid sequences that do not show a high degree of identity may nevertheless encode 
similar amiuo acid sequences due to the degeneracy of the genetic code. It is understood that changes 
in a nucleic acid sequence can be made using this degeneracy to produce multiple nucleic acid 
sequences that all encode substantially the same protein. 

10 The phrases "percent identity" and "% identity," as applied to polypeptide sequences, refer to 

the percentage of identical residue matches between at least two polypeptide sequences aligned using 
a standardized algorithm. Methods of polypeptide sequence aUgnineotaieweU-known. Some 
alignment methods take into account conservative ainino acid substitutions. Such cons^ative 
substitutions, explained in more detail above, generally preserve the charge and hydrophobicity at the 

15 site of substitution, thus preserving the structure (and therefore function) of the potypeptide. The 
phrases "percent similarity" and "% sirmlarily," as applied to polypeptide sequences, refer to the 
percentage of residue matches, including identical residue matches and conservative substitutions, 
between at least two polypeptide sequences aligned iising a standardized algorithm. In contrast, 
conservative substitutions are not included in the calculation of percent identity between polypeptide 

20 sequences. 

Percent identity between polypeptide sequences may be detemodned using the defoult 
parameters of the CLUSTAL V algorithm as incorporated into the MEGALIGN version 3. 12e 
sequence alignment program (described and referenced above). For pairwise alignments of 
polypeptide sequences using CLUSTAL V, the default parameters are set as follows: Ktuple=l, gap 
25 penalty^3, window=5, and "diagonals saved"=5. The PAM250 matrix is selected as the default 
residue weight table. 

Alternatively the NCBI BLAST software suite may be used. For exaicple, for a pairwise 
conoparison of two polypeptide sequences, one may use the "BLAST 2 Sequences" tool Veraion 
2.0. 12 (April-21-2000) with blastp set at defeult parameters. Such defeult parameters may be, for 
30 example: 

Matfix: BLOSUM62 

Open Gap: 11 and Extension Gap: 1 penalties 
Gap X drop-off: 50 
Expect: 10 
35 Word Size: 3 



36 



1 



wo 2004/098539 PCT/US2004/009215 

Filter: on 

Percent identity may be measured over the length of an entire dejBned polypeptide sequence, 
for example, as defined by a particular SEQ ID number, or may be measured over a shorter length, for 
example, over the length of a fragment taken from a larger, defined polypeptide sequence, for 
5 instance, a fragm^ of at least 15, at least 20, at least 30, at least 40, at least 50, at least 70 or at least 
150 contiguous residues. Such lengths are exemplary only, and it is understood that any fragment 
length supported by the sequences shown hwin, in the tables, figures or Sequence Listing, may be 
used to describe a length over which percentage idratity may be measured. 

"Human artificial chromosonoes" (HACs) are linear microchromosoomes which may contain 

10 DNA sequences of about 6 kb to 1 0 Mb in size and which contain all of the elements required for 
chromosome replication, segregation and maintenance. 

The term "hunoanized antibody" refers to an antibody molecule in which the amino acid 
sequence in the non-antigen binding regions has been altered so that the antibody more closely 
resenobles a human antibo(ty, and still retains its original binding ability. 

15 "Hybridization** refers to the process by vMcTi a polynucleotide strand anneals with a 

con9>lementary strand througji base pairing under defined hybridization conditions. Specific 
hybridization is an indication that two nucleic acid sequences share a high degree of 
coiiq)lementarity. Specific hybridization con^lexes form under permissive annealing conditions and 
remain hybridized after the "washing** step(s). The washing step(s) is particularly important in 

20 determining the stringency of the hybridization process, with more stringent conditions allowing less 
non-specific binding, i.e., binding between pairs of nucldic acid strands that are not perfectly 
matched. Permissive conditions for annealing of nucleic acid sequences are routinely determinable 
by one of ordinary skill in the art and may be consistent among hybridization experiments, whereas 
wash conditions may be varied among experiments to achieve the desired stringency, and therefore 

25 hybridization specificity. Pemissive annealing conditions occur, for example, at 68°C in the 

presence of about 6 x SSC, about 1% (w/v) SDS, and about 100 ^g^ml sheared, denatured salmon 
sperm DNA. 

Generally, stringency of hybridization is expressed, in part, with reference to the temperature 
under which the wash step is carried out. Such wash temperatures are typically selected to be about 

30 5*^0 to 20°C lower than the thermal melting point (TJ for the specific sequence at a defined ionic 
strength and pH. The T^ is the ten^erature (under defined ionic strength and pH) at which 50% of 
the target sequence hybridizes to a perfectly matched probe. An equation for calculating T^ and 
conditions for nucleic acid hybridization are well known and can be found in Sarnbrook, J. and D.W. 
Russell (2001; Molecular Cloning: A Laboratorv Manual , 3rd ed., vol. 1-3, Cold Spring Harbor Press, 

35 Cold Spring Harbor NY, ch. 9). 

37 



wo 2004/098539 PCT/US2004/009215 

High stringency conditions for hybridization between polynucleotides of the present 
invention include wash conditions of 68°C in the presence of about 0.2 x SSC and about 0.1% SDS, 
for 1 hour. Alternatively, ten^eratures of about 65°C, 60^C, 55°C, or 42^C may be used. SSC 
concentration may be varied from about 0. 1 to 2 x SSC, with SDS being present at about 0. 1 %. 
5 Typically, blocking reagents are used to block non-specific hybridization. Such blocking reagents 
include, for instance, sheared and denatured salmon sperm DNA at about 100-200 /xg/ml. Organic 
solvent, such as formamide at a concentration of about 35-50% vA^, may also be used under particular 
circumstances, such as for RNA:DNA hybridizations. Useful variations on these wash conditions 
will be readily apparent to those of ordinary skill in the art. Hybridization, particularly under high 

10 stringency conditions, may be suggestive of evolutionary similarity between the nucleotides. Such 
similarity is strongly indicative of a similar role for the nucleotides and their encoded polypeptides. 

The term "hybridization con5)lex" refers to a complex formed between two nucleic acids by 
virtue of the fonnation of hydrogen bonds between coii5)lementary bases. A hybridization complex 
may be formed in solution (e.g.. Cot or Rot analysis) or formed between one nucleic acid present in 

15 solution and another nucleic acid immobilized on a solid support (e.g., paper, membranes, filters, 

chips, pins or glass slides, or any oth^ appropriate substrate to which cells or their nucleic acids have 
been fixed). 

The words "insertion" and "addition" refer to changes in an amino acid or polynucleotide 
sequence resulting in the addition of one or more amino acid residues or nucleotides, respectively. 

20 "Immune response'* can refi^ to conditions associated with inflamnoation, trauma, immune 

disorders, or infectious or genetic disease, etc. These conditions can be characterized by expression 
of various factors, e.g., cytokines, chemokines, and other signaling molecules, which may affect 
ceUular and systemic defense systems. 

An "immunogenic firagment" is a polypeptide or oligopeptide firagmsnt of KPP which is 

25 capable of eliciting an immune response when introduced into a living organism, for example, a 

mammal. The term "immunogenic fragment" also includes any polypeptide or oligopeptide firagment 
of KPP which is useful in any of &e antibody production methods disclosed herdn or known in the 
art. ' 

The term "ndcroarray" refers to an arrangement of a plurality of polynucleotides, 
30 polypeptides, antibodies, or other chenucal compounds on a substrate. 

The terms "element" and "array element" refer to a polynucleotide, polypeptide, antibody, or 
other chemical confound having a unique and defined position on a rdcroarray. 

The term "modulate" refers to a change in the activity of KPP. For exaniple, modulation may 
cause an increase or a decrease in protein activity, binding characteristics, or any other biological, 
35 functional, or immunological properties of KPP. 



38 



wo 2004/098539 PCT/US2004/009215 

The phrases "nucleic acid" and "nucleic acid sequence" refer to a nucleotide, oligonucleotide, 
polynucleotide, or any fragment thereof. These phrases also refer to DNA or RNA of genonuc or 
synthetic origin which noay be single-stranded or double-stranded and may represent the sense or the 
antisense strand, to peptide nucleic add (PNA), or to any DNA-lilce or RNA-like material. 
5 "Operably linked" refers to the situation in which a first nucleic acid sequence is placed in a 

functional relationship with a second nucleic acid sequence. For instance, a promoter is operably 
linked to a coding sequence if the promoter affects the transcription or expression of the coding 
sequence. Operably linked DNA sequences may be in close proximity or contiguous and, where 
necessary to join two protein coding regions, in the same reading frame. 

10 "P^tide nucleic acid" (PNA) refers to an antisense molecule or anti-gene agent which 

conoprises an oligonucleotide of at least about 5 nucleotides in length linked to a peptide backbone of 
ammo acid residues ending in lysine. The temsnal lysine confers solubility to the coniposition. 
PNAs preferentially bind complementary single stranded DNA or RNA and stop transcript 
elongation, and may be pegylated to extend thdr lifespan in the cell. 

is *Tost-translational modification" of an KPP may involve lipidation, glycosylation, 

phosphorylation,* acetylation, racemization, proteolytic cleavage, and other modifications known in 
the art Theseprocessesnoay occur synthetically or biochemically. Biochemical modifications will 
vary by cell type depending on the enzymatic mQieu of KPP. 

"Probe" refers to nucleic acids encoding KPP, their complements, or fragments thereof, 

20 which are used to dfetect identical, allelic or related nucleic acids. Probes are isolated 

oligonucleotides or polynucleotides attached to a detectable label or reports molecule. Typical 
labels include radioactive isotopes, ligands, chemiluminescent agents, and enzymes. "Primers" are 
short nucldc acids, usually DNA oligonucleotides, which may be annealed to a target polynucleotide 
by conoplenoentary base-pairing. The primer may then be extended along the target DNA strand by a 

25 DNA polymerase enzyme. Primer pairs can be used for amplification (and identification) of a nucleic 
add, e.g., by the polymerase chain reaction (PGR). 

Probes and primers as used in the present invention typically comprise at least 15 contiguous 
nucleotides of a known sequence. In order to enhance specificity, longer probes and primers may also 
be enq)loyed, such as probes and primers that comprise at least 20, 25, 30, 40, 50, 60, 70, 80, 90, 100, 

30 or at least 150 consecutive nucleotides of the disclosed nucleic acid sequences. Probes and primers 
may be considerably longer than these examples, and it is understood that any length supported by the 
specification, including the tables, figures, and Sequence Listing, may be used. 

Methods for preparing and using probes and primers are described in, for example, 
Sambrook, J. and D.W. Russell (2001 : Molecular Cloning: A Laboratory Manual . 3rd ed., vol. 1-3, 

35 Cold Spring Harbor Press, Cold Spring Harbor NY), Ausubel, P.M. et al. (1999; Short Protocols in 



39 



wo 2004/098539 



PCT/US2004/009215 



Molecular Biolo^ , 4* ed., John Wfley & Sons, New York NY), and Inms, M. et al. (1990; PGR 
Protocols, A Guide to Mefliods and Applications , Academe Press, San Diego CA). PGR primer pairs 
can be d^ved from a known sequence, for exan^le, by using conqput^ programs inteaded for that 
purpose such as Primer (Version 0.5, 1991, Whitdiead Institute for Biomedical Research, Ganibridge 
5 MA). 

Oligonucleotides for use as primers are selected using software known in the art for such 
purpose. For example, OLJGO 4.06 software is useful for the selection of PGR primer pairs of up to 
100 nucleotides each, and for the analysis of oligonucleotides and larger polynucleotides of up to 
5,000 nucleotides from an input polynucleotide sequence of up to 32 kilobases. Similar primer 

10 selection programs have incorporated additional features for expanded capabilities. For exan^le, the 
PrimOU primer selection program (available to the public from the Genome Genter at University of 
Texas South West Medical Genter, Dallas TX) is capable of choosing specific primers from 
megabase sequences and is thus useful for designing primers on a genonoe-wide scope. The Primer3 
primer selection program (available to the public from the Whitehead Institute^MIT C^er for 

15 Genome Research, Gambridge MA) allows the user to iiq>ut a "nusprimlng library," in which 

sequences to avoid as primer binding sites are user-specified. PrimerS is usefid, in particular, for the 
selection of oligonucleotides for microarrays. (The source code for the latter two primer selection 
programs may also be obtained from their respective sources and modified to meet the user's specific 
needs.) The PrinoeGen program (available to the public from the UK Human Genome Mapping 

20 Project Resource Gentre, Gambridge designs primers based on multiple sequence alignments, 
thereby allowing selection of primers fliat hybridize to either the most conserved or least conserved 
regions of aligned nucleic acid sequences. Hence, this program is uiseful for identification of both 
unique and conserved oligonucleotides and polynucleotide fragments. The oligonucleotides and 
polynucleotide fragments identified by any of the above selection methods are useful in hybridization 

25 technologies, for example, as PGR or sequencing primers, microarray elements, or specific probes to 
identify fully or partially con5)lementary polynucleotides in a sample of nucleic acids. Methods of 
oligonucleotide selection are not limited to those described above. 

A "recombinant nucleic acid" is a nucleic acid that is not naturally occurring or has a 
sequence that is made by an artificial cornbination of two or more otherwise separated segments of 

30 sequence. This artificial combination is often accomplished by chemical synthesis or, more 
commonly, by the artificial manipulation of isolated segments of nucleic acids, e.g., by genetic 
engineering techniques such as those described in Sanibrook and Russell {supra). The term 
recombinant includes nucleic acids that have been altered solely by addition, substitution, or deletion 
of a portion of the nucleic acid. Frequenfly, a recornbinant nucleic acid may include a nucleic acid 

35 sequence operably linked to a promoter sequence. Such a recombinant nucleic acid may be part of a 

40 



wo 2004/098539 



PCTAJS2004/009215 



vector that is used, for exaiiq)le, to transform a cell. 

Alternatively, such recombinant nucleic acids may be part of a viral vector, e.g., based on a 
vaccinia virus, that could be use to vaccinate a manimal wherdn the recombinant nucleic acid is 
expressed, inducing a protective immunological response in the mamro ^l. 

A "regulatory element" refers to a nucleic acid sequence usually derived from untranslated 
regions of a gene and includes enhancers, promoters, introns, and 5* and 3* untranslated regions 
(UTRs), Regulatory elements interact with host or viral proteins which control transcription, 
translation, or RNA stability. 

"Reporter molecules" are chemical or biochendcal moieties used for labeling a nucleic add, 
amino add, or antibody. Reporter molecules include radionuclides; enzymes; fluorescent, 
chemiluminescent, or chromogenic agents; substrates; cofactors; inhibitors; magnetic particles; and 
other moieties known in the art 

An "RNA equivalent," in reference to a DNA molecule, is con5)osed of the same linear 
sequence of nucleotides as the reference DNA molecule with the exception that all occurrences of the 
nitrogenous base thymine are replaced with uracil, and the sugar backbone is conq)osed of ribose 
instead of deoxyiibose. 

The term "sample" is used in its broadest sense. A sanople suspected of containing KPP, 
nucleic adds encoding KPP, or fragments thereof may comprise a bodily fluid; an extract from a cell, 
chromosome, organelle,^ or musmbrane isolated from a cell; a cell; genonuc DNA, RNA, or cDNA, in 
solution or bound to a substrate; a tissue; a tissue print; etc. 

The terms "specific binding" and "specifically binding" refer to that interaction between a 
protein or peptide and an agonist, an antibody, an antagonist, a small molecule, or any natural or 
synthetic binding composition. The interaction is dependent upon the presence of a particular 
stracture of the protem, e.g., the antigenic determinant or epitope, recognized by the binding 
molecule. For example, if an antibody is specific for epitope "A," the presence of a polypeptide 
con^rising the epitope A, or the presence of free unlabeled A, in a reaction containing free labeled A 
and the antibody will reduce the amount of labeled A that binds to the antibody. 

The term "substantially purified" refers to nucleic acid or amiao acid sequences that are 
removed from their natural environnoent and are isolated or separated, and are at least about 60% free, 
preferably at least about 75% free, and most preferably at least about 90% free from other 
components with which they are naturally associated. 

A "substitution" refers to the replacement of one or more amino acid residues or nucleotides 
by different amino acid residues or nucleotides, respectively. 

"Substrate" refers to any suitable rigid or semi-rigid support including membranes, filters, 
chips, slides, wafers, fibers, magnetic or nonmagnetic beads, gels, tubing, plates, polymers, 



41 



wo 2004/098539 



PCT/US2004/009215 



microparticles and capillaries. The substrate can have a variety of surface forms, such as weEs, 
trenches, pins, channels and pores, to which polynucleotides or polypeptides are bound. 

A "transcript image'' or "expression profile** refers to the collective pattern of gene 
expression by a particular cell type or tissue under given conditions at a givm time. 
5 ^Transformation" describes a process by which exogenous DNA is introduced into a recipient 

cell. Transformation may occur under natural or artificial conditions according to various methods 
well known in the art, and may rely on any known method for the insertion of foreign nucleic acid 
sequences into a prokaryotic or eukaryotic host cell. The method for transformation is selected based 
on the type of host cell bdng transformed and may include, but is not limited to, bacteriophage or 

10 viral infection, electroporation, heat shock, lipofection, and particle bombardment. The term 
"transformed cells*' includes stably transformed cells in which the inserted DNA is capable of 
replication dth^ as an autonomously replicating plasmid or as part of the host chromosome, as well 
as transientiy transformed cells which express the inserted DNA or RNA for limited periods of time. 
A "transgenic organismi,** as used herein, is any organism, including but not limited to 

15 animals and plants, in which one or more of the cells of the organism contains heterologous nucleic 
acid introduced by way of human intervention, such as by transgenic techniques well known in the 
art. The nucleic acid is introduced into the cell, directly or indirectiy by introduction into a precursor 
of the cell, by way of deliberate genetic manipulation, such as by microinjection or by infection with 
a recombinant virus. In another embodiment, the nucleic acid can be introduced by infection with a 

20 recoinbinant viral vector, such as a lentiviral vector (Lois, C, et al. (2002) Science 295:868-872). The 
term genetic manipulation does not include classical cross-breeding, or in vitro fertilization, but 
rather is directed to the introduction of a recombinant DNA molecule. The transgenic organisms 
contenq)lated in accordance with the present invention include bacteria, cyanobacteria, fimgi, plants 
and animals. The isolated DNA of the present invention can be introduced into the host by methods 

25 known in the art, for example infection, transfection, transformation or transconjugation. Techniques 
for transferring the DNA of the present iuvention into such organisms are widely known and 
provided in references such as Sambrook and Russell (supra), 

A "variant" of a particular nucleic acid sequence is defined as a nucleic acid sequence having 
at least 40% sequence identity to the particular nucleic acid sequence over a certain length of one of 

30 the nucleic acid sequences using blastn with the "BLAST 2 Sequences" tool Version 2.0.9 (May-07- 
1999) set at default parameters. Such a pair of nucleic acids may show, for example, at least 50%, at 
least 60%, at least 70%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 
93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% or greater 
sequence identity over a certain defined length, A variant may be described as, for example, an 

35 "allelic" (as defined above), "splice," "species," or "polymorphic" variant. A splice variant may have 

42 



wo 2004/098539 PCTAJS2004/009215 

significant identity to a reference molecule, but will generally have a greater or lesser number of 
polynucleotides due to alternate splicing during mRNA processing. The corresponding polypeptide 
may possess additional functional domains or lack domains that are present in the reference molecule. 
Species variants are polynucleotides that vary from one species to another. The resulting 
5 polypeptides will generally have signiiBcant amino acid identity relative to each other. A 

polymorphic variant is a variation in the polynucleotide sequence of a particular gene between 
individuals of a given species. Polymorphic variants also may encompass "single nucleotide 
polymorphisms" (SNPs) in which the polynucleotide sequence varies by one nucleotide base. The 
presmce of SNPs may be indicative of, for example, a certain population, a disease state, or a 

10 propensity for a disease state. 

A "variant" of a particular polypeptide sequence is defined as a polypeptide sequence having 
at least 40% sequence identity or sequence sinularity to the particular polypeptide sequence over a 
certain length of one of the polypeptide sequences using blastp with the "BLAST 2 Sequences" tool 
Version 2.0.9 (May-07-1999) set at drfault parameters. Such a pair of polypeptides may show, for 

15 exancple, at least 50%, at least 60%, at least 70%, at least 80%, at least 85%, at least 90%, at least 
91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, 
or at least 99% or greater sequence identity or sequence sinularity over a certain defined length of one 
of the polypeptides. 



20 THE INVENTION 

Various embodiments of the invention include new human kinases and phosphatases (KPP), 
the polynucleotides encoding KPP, and the use of these compositions for the diagnosis, treatment, or 
prevention of cardiovascular diseases, immune system disorders, neurological disorders, disorders 
affecting growth and development, lipid disorders, cell proliferative disorders, and cancers. 

25 Table 1 summarizes the nomenclature for the full length polynucleotide and polypeptide 

enoibodiments of the invention. Each polynucleotide and its corresponding polypeptide are correlated 
to a single Incyte project identification number (Incyte Project ID). Each polypeptide sequence is 
denoted by both a polypeptide sequence identification number (Polypeptide SEQ ID NO:) and an 
Incyte polypeptide sequence number (iicyte Polypeptide ID) as shown. Each polynucleotide 

30 sequence is denoted by both a polynucleotide sequence identification number (Polynucleotide SEQ 
ID NO:) and an Incyte polynucleotide consensus sequence number (Incyte Polynucleotide ID) as 
shown. Column 6 shows the Incyte ID nunoibers of physical, full length clones corresponding to the 
polypeptide and polynucleotide sequmces of the invention. TTie full length clones encode 
polypeptides which have at least 95% sequence identity to the polypeptide sequences shown in 

35 column 3. 



43 



wo 2004/098539 



PCT/US2004/009215 



Table 2 shows sequences mth hamology to polypeptide embodiments of the inveniion as 
identified by BLAST analysis against the GenBank protein (genpept) database and the PROTEOME 
database. Columns 1 and 2 show the polypeptide sequence identification number (Polypeptide SEQ 
ED NO:) and the corresponding Incyte polypeptide sequence number (Incyte Polypeptide ID) for 
5 polypeptides of the invention. Column 3 shows the GenBank identification nuiriber (GenBank ID 
NO:) of the nearest GenBank homolog and the PROTEOME database identification numbers 
(PROTEOME ID NO:) of the nearest PROTEOME database homologs. Column 4 shows the 
r probability scores for the matches between each polypeptide and its homolog(s). Column 5 shows the 
annotation of the GenBank and PROTEOME database homolog(s) along with relevant citations 

10 ^'s^ere applicable, all of which are expressly incorporated by reference herein. 

Table 3 shows various structural features of the polypeptides of the invention. Columns 1 
and 2 show the polypeptide sequence identification number (SEQ ID NO:) and the corresponding 
Incyte polypeptide sequence number (Incyte Polypeptide ID) for each polypeptide of the invention. 
Column 3 shows the number of amino acid residues in each polypeptide. Column 4 shows amino 

15 acid residues conqprising signature sequences, domains, motifs, potential phosphorylation sites, and 
potential glycos^ation sites. Column 5 shows analytical methods for protein stracture/function 
analysis and in sonoe cases, searchable databases to which the analytical methods were applied. 

Together, Tables 2 and 3 summarize the properties of polypeptides of the invention, and these 
properties establish that the claimed polypeptides are kinases and phosphatases. For example, SEQ 

20 ID NO:l 1 is 78% identical, fromresidue Ml to residue W1219, to mouse NIK (GenBank ID 

gl 872546) as determined by the Basic Local Alignment Search Tool (BLAST). (See Table 2.) The 
BLAST probability score is 0.0, v^ch indicates the probability of obtaining the observed polypeptide 
sequence alignment by chance. SEQ ID NO:l 1 also has homology to proteins that activate the c-Jun 
N-tenrimal kinase (MapkS) signaling pathway, and are nfitogen-activated proteiu kinase kinase kmase 

25 kinases (MAP4K), as determined by BLAST analysis using the PROTEOME database. SEQ ID 
NO:l 1 also contains a CNH domain, a protdn kinase domain, a domain found in NIKl-like kinases, 
and a serine/threonine kmase catalytic domain, as determined by searching for statistically significant 
matdies in the hidden Markov model (HMM)-based PFAM and SMART databases of conserved 
protein families/domains. (See Table 3.) Data from BLIMPS, MOTIFS, and PROFILESCAN 

30 analyses, and BLAST analyses against the PRODOM and DOMO databases, provide further 
corroborative evidence that SEQ ID NO:l 1 is a protein kinase. 

As another example, SEQ ID NO:15 is 99% identical, from residue E124 to residue 1750, to 
human lymphoid phosphatase LyPl (GenBank ID g4100632) as determmed by the Basic Local 
Alignment Search Tool (BLAST). (See Table 2.) The BLAST probability score is 0.0, which 

35 indicates the probability of obtaining the observed polypeptide sequence alignment by chance. SEQ 

44 



wo 2004/098539 



PCTAJS2004/009215 



ID NO: 15 also has homology to proteins that may be involved in T-ceU development and are required 
for B-cell antigen receptor-mediated growth arrest and apoptosis and are protein tyrosine phosphatase 
non-receptors, as determined by BLAST analysis using the PROTEOME database. SEQ ID NO: 15 
also contains a protein-tyrosine phosphatase domain, a protein-tyrosine phosphatase catalytic domain, 
5 and a protein-tyrosine phosphatase catalytic motif domain as determined by searching for statistically 
significant matches in the hiddm Markov model (HMM)-based SMART and PFAM databases of 
conserved protein families/domains. (See Table 3.) Data from BLIMPS, MOTIFS, and 
PROFILESCAN analyses, and BLAST analyses against the PRODOM and DOMO databases, 
provide further corroborative evidence that SEQ ID NO: 15 is a protein-tyrosine phosphatase. 
10 As another exanople, SEQ ID NO:24 is 99% identical, fromresidue Ml to residue K487, to 

human apyrase (GenBank ID g4583675) as determined by the Basic Local Alignment Search Tool 
(BLAST). (See Table 2.) The BLAST probability score is 3.7e-264, which iodicates the probability 
of obtaining the observed polypeptide sequence alignmmt by chance. SEQ ID NO:24 also has 
homology to proteins that are localized to the lysosomal/autophagic vacuoles and are apyrase 
15 proteins, as determined by BLAST analysis using the PROTEOME database. SEQ ID NO:24 also 
contains a GDA1/CD39 (nucleoside phosphatase fannly) domain as detenidned by searching for 
statisticaUy significant noatches in the hidden Markov model (HMM)-based PFAM database of 
conserved protein famDies/domains. (See Table 3.) Data fi-om BLIMPS and BLAST analyses against 
the PRODOM and DOMO databases, provide further corroborative evidence that SEQ ID NO:24 is a 
20 nucleoside phosphatase. 

As another exan^de, SEQ ID NO:27 is 97% identical, fromresidue Ml to residue G76, to 
hnman SKRPl (GenBank ID gl8148911) as detemuned by the Basic Local Alignment Search Tool 
(BLAST)- (See Table 2.) The BLAST probjabihiy score is 5.7e-35, wMch indicates the probabihty of 
obtaining the observed polypeptide sequence alignment by chance. SEQ ID NO:27 also has 
25 homology to protdns that dephosphorylate phosphotyrosine and phosphoserine^ inactivate MAPK, 
and are proteins containing two dual specificity phosphatase catalytic domains, as determined by 
BLAST analysis using the PROTEOME database. Data from BLIMPS analyses provide further 
corroborative evidence that SEQ ID NO:27 is a dual specificity phosphatase. 

As another example, SEQ ID NO:28 is 98% identical, from residue Ml to residue S449, to . 
30 human protem phosphatase 4 regulatory subunit 2 (GenBank ID g8250239) as determined by the 

Basic Local Alignment Search Tool (BLAST). (See Table 2.) The BLAST probabiKly score is 1.4E- 
241, which indicates the probability of obtaining the observed polypeptide sequence aUgnment by 
chance. SEQ ID NO:28 also has honoology to human protein phosphatase 4 regulatory subunit 2, as 
determined by BLAST analysis using the PROTEOME database. The foregoing provide evidence 
35 that SEQ ID NO:28 is a protein phosphatase regulatory subunit 

45 



wo 2004/098539 



PCT/US2004/009215 



As anotber example, SEQ ID NO:34 is 93% identical, jfrom residue E39 to residue 1490, to 
human rnultifunctional calchim/caknodulin-dependent protein kinase n delta2 isofonn (GenBank ID 
g4426595) as determined by fhe Basic Local Alignment Search Tool (BLAST). (See Table 2.) The 
BLAST probability score is 9.0e-255, which indicates the probability of obtaining the observed 
5 polypeptide sequence alignment by chance. SEQ ID NO:34 also has homology to calciunar 
calmodulin dependent protein Idnase n delta, a member of the multifunctional CAMKII fannly 
involved in Ca2+ regulated processes, of which the alternative form delta 3 is specifically upregulated 
in the natyocardium of patients with heart failure, as determined by BLAST analysis using the 
PROTEOME database. SEQ ID NO:34 also contains a protein kinase domain and a serine/flireonine 

10 protein kinase catalytic donoain as determined by searching for statistically significant matches in tbe 
hidden Markov noodel (HMM)-bas6d PFAM and SMART databases of conserved protein 
families/domains- (See Table 3.) Data firomBLIMPS, MOTIFS, and PROFILESCAN analyses, and 
BLAST analyses against the PRODOM and DOMO databases, provide further corroborative evidence . 
that SEQ ID NO:34 is a calciunaecalmodnlin dependeot protein Idnase. The foregoing provides 

15 evidence that SEQ ED NO:34 is a calcium-calmodulin dependent protein Idnase. 

SEQ ID NO:1-10, SEQ ID NO:12-14, SEQ ID NO:16-23, SEQ ID NO:25-26, SEQ ID 
NO:29-33, and SEQ ID NO:35-43 were analyzed and annotated in a sinilar manner. The algorithms 
and parameters for the analysis of SEQ ID NO:l-43 are described in Table 7. 

As shown in Table 4, the fiill length polynucleotide endbodiments were assembled using 

20 cDNA sequences or coding (exon) sequences derived from genomic DNA, or any combination of 
these two types of sequences. Column 1 lists the polynucleotide sequence idmtification number 
(Polynucleotide SEQ ID NO:), the corresponding Incyte polynucleotide consensus sequence nnndber 
(Incyte ID) for eacb polynucleotide of the invention, and the length of each polynucleotide sequence 
inbasepairs. Column 2 shows the nucleotide start (5') and stop (3') positions of the cDNA and/or 

25 genomic sequences used to assenible the full length polynucleotide embodiments, and of fragments of 
the polynucleotides which are useful, for example, in hybridization or amplification technologies that 
identify SEQ ID NO:44-86 or that distinguish between SEQ ID NO:44-86 and related 
polynucleotides. 

The polynucleotide fragments described in Column 2 of Table 4 may refer specifically, for 
30 example, to Incyte cDN As derived from tissue-specific cDNA libraries or from pooled cDNA 

libraries. Alternatively, the polynucleotide fragments described in column 2 may refer to GenBank 
cDNAs or ESTs which contributed to the assembly of the full length polynucleotides. In addition, the 
polynucleotide fragments described in column 2 may identify sequences derived from the ENSEMBL 
(The Sanger Centre, Cambridge, UK) database (i.e., those sequences including the designation 
35 "ENST*'). Alternatively, the polynucleotide fragments described in column 2 may be derived from 

46 



wo 2004/098539 PCT/US2004/009215 

the NCBI RefSeq Nucleotide Sequence Records Database (le., fliose sequences including the 
designation "NM" or *'NT") or the NCBI RefSeq Protein Sequence Records (i.e. , fliose sequences 
including flie designation "NP'O- Alternatively, flie polynucleotide fragments described in column 2 
may refer to asseniblages of both cDNA and Genscan-predicted exons brought together by an "exon 

5 stitching'^ algorithxa For exanople, a polynucleotide sequence id^itified as 

flsJOOOCXXJfjJfsJ^yyy^ JVjJV^ represents a "stitched*' sequence in which XXXXXY is the 
identification nuniber of the cluster of sequences to which the algorithm was applied, and YlTlTis 
the number of the prediction generated by the algorithm, and N,^^^., if present, represent specific 
exons that may have been manually edited during analysis (See Example V). Alternatively, the 

10 polynucleotide fragmoots in column 2 may refer to assemblages of exons brought together by an 
"exon-stretching" algori.thnL For exan^le, a polynucleotide sequence identified as 
FLJCXXXXX_,eAAAAA^BBBB_lJ^is a "stretdied" sequence, witiiXXXXXybeing the Incyte 
project identification number, gAAAAA b^ng the GenBank identification nuniber of the human 
genomic sequence to which the "exon-stretching" algorithm was applied, gPBBBB being the 

15 G^iBank identification number or NCBI RefSeq identification number of the nearest GenBank 
protdoibomolog, and iST referring to specific exons (See Exanople V). In instances where a RefSeq 
sequence was used as a protein homolog for the *'exon-stretching" algoritiun, a RefSeq identifier 
(denoted by "NM," "NP," or "NT") may be used in place of the GenBank identifier (i. e. , ^BBBB). 
Alternatively, a prefix identifies con^nent sequences that were hand-edited, predicted from 

20 genomic DNA sequmces, or derived from a combination of sequence analysis methods. The 

frain wing Table lists exaiqples of coiqponent sequence prefixes and corresponding sequence analysis 
mefliods associated with the prefixes (see Bxan^le IV and Exan^le V). 



Prefix 


Type of analysis and/or exanoples of programs 


GNN, GFG, 
ENST 


Exon prediction from genomic sequences using, for example, 
GENSCAN (Stanford University, CA, USA) or FGENES 
(Corcputer G^umdcs Group, The Sanger Centre, Canibridge, UK), 


GBI 


Hand-edited analysis of genomic sequences. 


FL 


Stitched or stretched genordc sequences (see Example V). 


INCY 


Pun length transcript and exon prediction from mapping of EST 
sequences to the genome. Genomic location and EST concposition 
data are combined to predict the exons and resulting transcript. 



30 In some cases, Incyte cDNA coverage redundant with the sequence coverage shown in Table 

4 was obtained to confirm the final consensus polynucleotide sequence, but the relevant Incyte cDNA 
identification numb^ are not shown. 



47 



wo 2004/098539 PCTAJS2004/009215 

Table 5 shows the representative cDNA libraries for those full length polynucleotides which 
were assembled using Incyte cDNA sequences. The representative cDNA library is the Inqyte cDNA 
library which is most frequently represented by the Incyte cDNA sequences which were used to 
assemble and confirm the above polynucleotides. The tissues and vectors which were used to 

5 construct the cDNA libraries shown in Table 5 are described in Table 6. 

Table 8 shows single nucleotide polymorphisms (SNPs) found in polynucleotide sequences of 
the invention, along with allele frequencies in different human populations. Colutnns 1 and 2 show 
the polynucleotide sequence identification nuniber (SEQ ID NO:) and the correspondmg Incyte 
project identification number (PID) for polynucleotides of the invention. Column 3 shows the Incyte 

10 id^itification nurriber for the EST in which the SNP was detected (EST ID), and cohimn 4 shows the 
identification nuniber for the SNP (SNP ID). Column 5 shows the position within the EST sequence 
at which the SNP is located (EST SNP), and column 6 shows the position of the SNP within the fixQ- 
length polynucleotide sequence (CB 1 SNP). Column 7 shows the allele found in the EST sequence. 
Columns 8 and 9 show the two alleles found at the SNP site. Column 10 shows the amino acid 

15 encoded by the codon including the SNP site, based upon the allele found in the EST. Columns 11- 
14 show the frequency of allele 1 in foiu: different human populations. An entry of n/d (not detected) 
indicates that the frequency of allele 1 in the population was too low to be detected, while n/a (not 
available) indicates that the allele firequency was not determined for tihe population. 

The invention also encompasses KPP variants. Various endbodiments of KPP variants can 

20 have at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 
90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 
95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% annno acid 
sequence identity to the KPP amino acid sequence, and can contain at least one frinctional or 
stmctural characteristic of KPP. 

25 Various embodiments also encompass polynucleotides which encode KPP. In a particular 

embodiment, the invention oaconpasses a polynucleotide sequence comprising a sequence selected 
from the group consisting of SEQ ID NO:44-86, which mcodes KPP. The polynucleotide sequences 
of SEQ ID NO:44-86, as presented in the Sequence Listing, embrace the equivalent RNA sequences, 
wherein occurrences of the nitrogenous base thymine are replaced with uracil, and the sugar 

30 backbone is composed of ribose instead of deoxyribose. 

The invention also encompasses variants of a polynucleotide encodiag KPP. In particular, 
such a variant polynucleotide will have at least about 70%, at least about 75%, at least about 80%, at 
least about 85%, at least about 90%, at least about 91 %, at least about 92%, at least about 93%, at 
least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at 

35 least about 99% polynucleotide sequence identity to a polynucleotide encoding KPP. A particular 



48 



wo 2004/098539 PCTAJS2004/009215 

. aspect of the inveDfion enconqjasses a variant of a polynucleotide comprising a sequence selected 
from the group consisting of SEQ ID NO:44-86 which has at least about 70%, at least about 75%, at 
least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at 
least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at 
5 least about 98%, or at least about 99% polynucleotide sequence identity to a nucleic acid sequence 
selected from the group consisting of SEQ ID NO:44-86. Any one of the polynucleotide variants 
described above can encode a polypeptide whicb contains at least one functional or structural 
characteristic of KPP. 

In addition, or in the alternative, a polynucleotide variant of the invention is a splice variant 

10 of a polynucleotide encoding KPP. A splice variant may have portions which bave significant 
sequence identity to a polynucleotide encoding KPP, but will generally have a greater or lesser 
number of nucleotides due to additions or deletions of blocks of sequence arising from alternate 
splicing during niRNA processing. A splice variant may have less than about 70%, or alternatively 
less than about 60%, or alternatively less than about 50% polynucleotide sequence identity to a 

15 polynucleotide encoding KPP over its entire length; however, portions of the splice variant will have 
at least about 70%, or alternatively at least about 85%, or alternatively at least about 95%, or 
alternatively 100% polynucleotide sequence identity to portions of the polynucleotide encoding KPP. 
For example, a polynucleotide comprising a sequence of SEQ ID NO:48, a polynucleotide conq)rising 
a sequence of SEQ ID NO:49 and a polynucleotide comprising a sequence of SEQ ID NO:50 are 

20 splice variants of each other; a polynucleotide conoprising a sequence of SEQ ID NO:75 and a 
polynucleotide comprising a sequence of SEQ ID NO:76 are splice variants of each other; a 
polynucleotide comprising a sequence of SEQ ID NO:77 and a polynucleotide comprising a sequence 
of SEQ ID NO:78 are splice variants of each other; a polynucleotide con^>rising a sequence of SEQ 
ID NO:79 and a polynucleotide comprising a sequence of SEQ ID NO:80 are splice variants of each 

25 other; a polynucleotide conoprising a sequence of SEQ ID NO:57 and a polynucleotide con^nising a 
sequence of SEQ ID NO:62 are splice variants of each other, and a polynucleotide conprising a 
sequence of SEQ ID NO:68 and a polynucleotide comprising a sequence of SEQ ID NO:69 are splice 
variants of each other. Any one of the splice variants described above can encode a polypeptide 
which contains at least one functional or structural characteristic of KPP. 

30 It wiQ be appreciated by those skiUed in the art that as a result of the degen^cy of the 

genetic code, a multitude of polynucleotide sequences encoding KPP, some bearing minimal 
sinmlarity to the polynucleotide sequences of any known and naturally occurring gene, may be 
produced. Thus, the invention confm^lates each and every possible variation of polynucleotide 
sequence that could be made by selecting combinations based on possible codon choices. These 

35 combinations are made in accordance with the standard triplet genetic code as applied to the 



49 



wo 2004/098539 PCT/US2004/009215 

polynucleotide sequence of naturally occurring KPP, and all such variations are to be considered as 

being specifically disclosed. 

Alfbough. polynucleotides which encode KPP and its variants are generally capable of 

hybridizuig to polynucleotides ^icoding naturally occurring KPP under appropriately selected 
5 conditions of stringency, it nsay be advantageous to produce polynucleotides ^coding KPP or its 

derivatives possessing a substantially different codon usage, e.g., inclusion of non-naturaUy occurring 

codons. Codons may be selected to increase the rate at whidi expression of Hie peptide occurs in a 

particular prokaryotic or eukaryotic host in accordance with flie frequency with which particular 

codons are utilized by the host Other reasons for substantially altering the nucleotide sequence 
10 encoding KPP and its derivatives without altering the encoded anmno acid sequences include the 

production of RNA transcripts ba^sdng more desirable properties, such as a greater half-life, than 

transcripts produced firomthe naturally occurring sequence. 

The invention also encompasses production of polynucleotides which encode KPP and KPP 

derivatives, or fragments thereof, entirely by synthetic chemistry. After production, the synthetic 
15 polynucleotide may be inserted into any of the n^y available expression vectors and cell systems 

using reagents well known in the art Moreover, synthetic chemistry may be used to introduce 

mutations into a polynucleotide encoding KPP or any firagment thereof. 

Embodiments of the inveotion can also include polynucleotides that are capable of 

hybridizing to the claimed polynucleotides, and, in particular, to those having the sequences shown in 
20 SEQ ID NO:44-86 and fragments thereof, under various conditions of stringency (WaM, G.M. and 

S.L. Berger (1987) Methods Enzymol. 152:399-407; Kimmel, A.R. (1987) Methods Enzymol. 

152:507-51 1). Hybridization conditions, including annealing and wash conditions, are described in 

"Definitions/* 

Methods for DNA sequencing are well Icnown in the art and may be used to practice any of 
25 the embodiments of the inventioiL The methods may employ such enzymes as the Klenow firagment 
of DNA polymerase I, SEQUENASE (US Biochemical, Cleveland OH), Taq polymerase (Applied 
Biosystems), thermostable T7 polymerase (Amersham Biosciences, Piscataway NJ), or combinations 
of polymerases and proofreading exonucleases such as those found in the ELONGASE an:5)lification 
system (Invitrogen, Carlsbad CA). Preferably, sequence preparation is autonoated with machines such 
30 as the MICROLAB 2200 liquid transfer system (Hamilton, Reno NV), PTC200 thermal cycler (MJ 
Research, Watertown MA) and ABI CATALYST 800 thermal cycler (Applied Biosystems). 
Sequencing is then carried out using either the ABI 373 or 377 DNA sequencing system (Applied 
Biosystems), the MEGABACE 1000 DNA sequencing system (Amersham Biosciences), or other 
systems known in the art. The resulting sequences are analyzed using a variety of algorithms which 
35 are well Icnown in the art (Ausubel et al., supra^ ch. 7; Meyers, R.A. (1995) Molecular Biology and 



50 



wo 2004/098539 PCT/US2004/009215 

Bioteclmologv . Wiley VCH, New York NY, pp. 856-853). 

The nucleic acids encoding KPP may be extended utilizing a partial nucleotide sequence and 
enoploying various PCR-based methods known in the art to detect upstream sequences, such as 
promoters and regulatory elements. For exanqjle, one method whicli may be en^loyed, 
5 restriction-site PGR, uses universal and nested primers to arcplify unknown sequence from genomic 
DNA within a cloning vector (Sarkar, G. (1993) PGR Methods Applic. 2:318-322). Another method, 
inverse PGR, uses primers that extend in divergent directions to amplify unknown sequence from a 
circularized template. The template is derived from restriction fragments con5)rising a known 
genomic locus and surrounding sequences (Triglia, T. et al. (1988) Nucleic Acids Res, 16:8186). A 
, 10 third method, capture PGR, involves PGR an5)lijacation of DNA fragments adjacent to known 
sequences inhuman and yeast artificial chromosome DNA (Lagerstrom, M. et al. (1991) PGR 
Methods Applic. 1:111-1 19). In this method, multiple restriction enzyme digestions and ligations 
may be used to insert an engineered double-stranded sequence into a region of unknown sequence 
before perfommng PGR. Other methods which may be used to retrieve unknown sequences are 

15 known in the art (Parker, J.D. et al. (1991) Nucldc Acids Res. 19:3055-3060). Additionally, one may 
use PGR, nested primers, and PROMOTERFINDER libraries (BD Glontech, Palo Alto GA) to walk 
genonnc DNA. This procedure avoids the need to screen libraries and is useful in finding intron/exon 
junctions. For all PGR-based noethods, primers may be designed using commerdaUy available 
software, such as OLIGO 4.06 prLnn^ analysis software (National Biosciences, Plymouth MN) or 

20 another appropriate prograjooi, to be about 22 to 30 nucleotides in length, to have a GG cont^t of 
about 50% or more, and to anneal to the teo^late at temperatures of about 68^G to 72^G. 

When screening for Ml Imgth cDNAs, it is preferable to use libraries that have been 
size-selected to include larger cDNAs. In addition, randonorprimed libraries, which often include 
sequences containing the 5* regions of genes, are preferable for situations in which an oligo d(T) 

25 Ubrary does iiot 3ield a full-length cDNA Genomic libraries may be useftd for extension of sequence 
into 5' non-transcribed regulatory regions. 

Gapillary electrophoresis systems which are commercially available may be used to analyze 
the size or confirm the nucleotide sequence of sequencing or PGR products. In particular, capiQary 
sequencing inay en^loy flowable polymers for electrophoretic separation, four different nucleotide- 

30 specific, laser-stimulated fluorescent dyes, and a charge coupled device camera for detection of the 
enutted wavelengths. Output/light intensity may be converted to electrical signal using appropriate 
software (e.g., GENOTYPER and SEQUENGE NAVIGATOR, Applied Biosystenas), and the entire 
process fromloading of samples to conaputer analysis and electronic data display may be concenter 
controlled. Gapillary electrophoresis is especially preferable for sequencing small DNA fragments 

35 which may be present in linnted amounts in a particular san^le. 



51 



I 



wo 2004/098539 PCT/US2004/009215 

In anothCT enibodixx^Qt of the inveation, polynucleotides or fragments thereof which encode 
KPP may be cloned in recombinant DNA naolecules that direct expression of KPP, or fragments or 
functional equivalents thereof, in appropriate host cells. Due to the inherent degeneracy of the 
genetic code, other polynucleotides which encode substantially the same or a functionally equivalent 
5 polypeptides may be produced and used to express KPP. 

The polynucleotides of the invention can be engineered ustog methods generally known in 
the art in order to alter KPP-encoding sequences for a variety of purposes including, but not linuted 
to, modification of the cloning, processing, and/or expression of the gene product. DNA sbufFlmg by 
random fragmentation and PGR reassembly of gene fragments and synthetic oligonucleotides may be 

10 used to engineer the nucleotide sequmces. For exanqple, oligonucleotide-mediated site-directed 

mutagenesis may be used to introduce mutations that create new restriction sites, alter glycosylation 
patterns, change codon preference, produce splice variants, and so forth. 

The nucleotides of ftie present invention may be subjected to DNA shuffling techniques such 
as MOLECULARBREEDING (Maxygen Inc., Santa Clara CA; described in U.S. Patent No. 

15 5,837,458; Chang, C,-C. et al. (1999) Nat Biotechnol. 17:793-797; Christians, RC. et al. (1999) Nat 
BiotechnoL 17:259-264; and Crameri, A et al. (1996) Nat Biotechnol. 14:315-319) to alter or 
improve Ihe biological properties of KPP, such as its biological or enzymatic activity or its ability to 
bind to other molecules or compoiinds. DNA shuffling is a process by which a library of gene 
variants is produced using PCR-mediated recornbination of gene fragments. The library is then 

20 subjected to selection or screening procedures that identify those gene variants with the desired 

properties. These preferred variants may then be pooled and further subjected to recursive rounds of 
DNA shuffling and selection/screening. Thus, genetic diversity is created through "artificial" 
breeding and rapid molecular evolution. For example, fragments of a single gene containing random 
point mutations may be recoiribined, screened, and then reshuffled until the desired properties are 

25 optimized. Alternatively, fragments of a given gene may be recombined with fragments of 
homologous genes in the same gene family, either from the same or different species, thereby 
maximizing the genetic diversity of multiple naturally occurring genes in a directed and controllable 
manner. 

In another embodiment, polynucleotides encoding KPP may be synthesized, in whole or in 
30 part, using one or more chemical methods well known in the art (Caruthers, M.H. et al. (1980) 

Nucleic Acids Symp. Ser. 7:215-223; Hom, T. et al. (1980) Nucleic Acids Symp. Ser. 7:225-232). 
Alternatively, KPP itself or a fragment thereof may be synthesized using chemical methods known in 
the art For example, peptide synthesis can be performed using various solution-phase or solid-phase 
techniques (Creighton, T. (1984) Proteins, Structures and Molecular Properties . WH Freeman, New 
35 York NY, pp. 55-60; Roberge, J.Y. et al. (1995) Science 269:202-204). Automated synthesis may be 



52 



wo 2004/098539 



PCT/US2004/009215 



achieved using the ABI 431 A peptide synthesizer (Applied Biosystems). Additionally, the amino 
acid sequence of KPP, or any part thereof, may be altered during direct synthesis and/or conibined 
with sequences from other proteins, or any part thereof, to produce a variant polypeptide or a 
polypeptide having a sequence of a naturally occurring polypeptide. 

The peptide may be substantially purified by preparative high performance liquid 
chromatography (Chiez, R.M. and F.Z. Regnier (1990) Methods Enzymol. 182:392-421). The 
composition of the synthetic peptides may be confirmed by amino acid analysis or by sequencing 
(Creighton, supra, pp. 28-53). 

In order to express a biologically active KPP, the polynucleotides encoding KPP or 
derivatives thereof may be inserted into an appropriate expression vector, i.e., a vector which contains 
the necessary elements for transcriptional and translational control of the inserted coding sequence in 
a suitable host Hiese elenaents include regulatory sequences, such as enhancers, constitutive and 
inducible promoters, and 5* and 3* untranslated regions in the vector and in polynucleotides encoding 
KPP, Such elements may vary in their streogfh and specificity. Specific initiation signals may also 
be used to achieve more efficient translation of polynucleotides encoding KPP. Such signals include 
the ATG initiation codon and adjacent sequences, e.g. the Kozak sequence. In cases where a 
polynucleotide sequence encoding KPP and its initiation codon and upstream regulatory sequences 
are inserted into the appropriate expression vector, no additional transcriptional or translational 
control signals may be needed. However, in cases where only coding sequence, or a fragment 
thereof, is inserted, exogenous translational control signals including an in-frame ATG ioitiation 
codon should be provided by the vector. Exogenous translational elements and initiation codons may 
be of various origros, both natural and synthetic. The efficiency of expression may be enhanced by 
the inclusion of enhancers appropriate for the particular host cell system used (Scharf, D. et al. (1994) 
Results Probl. Cell Differ. 20:125-162). 

Methods which are well known to those sldlled in the art may be used to construct expression 
vectors containing polynucleotides encoding KPP and appropriate transcriptional and translational 
control elements. These methods include in vitro recombinant DNA techniques, synthetic techniques, 
and in vivo genetic reconabination (Sambrook and Russell, supra, ch. 1-4, and 8; Ausubel et al., 
supra, ch. 1, 3, and 15). 

A variety of expression vector/host systems may be utilized to contain and express 
polynucleotides encoding KPP. These include, but are not limited to, microorganisms such as 
bacteria transformed with recombinant bacteriophage, plasmid, or cosmid DNA expression vectors; 
yeast transformed with yeast expression vectors; insect cell systems infected with viral expression 
vectors (e.g., baculovirus); plant cell systems transformed with viral expression vectors (e.g., 
cauliflower mosaic virus, CaMV, or tobacco mosaic virus, TMV) or with bacterial expression vectors 



53 



wo 2004/098539 PCTAJS2004/009215 

» 

(©.g., Ti or pBR322 plasndds); or animal cell systems (Sarnbrook and Russell, supra\ Aasubel et al., 
supra; Van Heeke, G. and S.M. Schuster (1989) J. BioL CSiem. 264:5503-5509; Engelhard, RK. et al. 
(1994) Proc. Natl. Acad. Sci. USA 91:3224-3227; Sandig, V. et aL (1996) Hum. Gene Ther. 7:1937- 
1945; Takamatsu, N, (1987) EMBO L 6:307-311; The McGraw Hill Yearboo k of Science and 
5 Technology (1992) McGraw Hflll, New York NY, pp. 191-196; Logan, J. and T. Shenk (1984) Pioc. 
Natl. Acad. Sci. USA 81:3655-3659; Harrington, J. J. et al. (1997) Nat. Genet 15:345-355). 
Expression vectors derived fix>m retroviruses, adenoviruses, or herpes or vaccinia viruses, or from 
various bacterial plasmids, may be used for delivery of polynucleotides to the targeted organ, tissue, 
or ceU population (Di Nicola, M. et al. (1998) Cancer Gen. Tber. 5:350-356; Yu, M. et al. (1993) 
10 Proc. Nafl. Acad. Sci. USA 90:6340-6344; Buller, R.M. et al. (1985) Nature 317:813-815; McGregor, 
D.P. et al. (1994) Mol. Immunol. 31:219-226; Verma, LM. and N. Somia (1997) Nature 389:239- 
242). The invention is not linodted by the host cell employed. 

In bacterial systems, a nuiriber of cloning and expression vectors may he selected depending 
upon the use iatended for polynucleotides encodiog KPP. For example, routine cloning, subcloning, 
15 and propagation of polynucleotides encoding KPP can he achieved using a multifunctional E. coli 
vector such as PBLUESCRIPT (Stratagene, La Jofla CA) or PSPORTl plasmid (Invitrogen). 
Ligation of polynucleotides encoding KPP into the vector's multiple cloning site disrupts the lacZ 
gene, allowing a colorimetric screening procedure for identification of transformed bacteria 
containing recoinbinant molecules. In addition, these vectors may be useful for in vitro transcription, 
20 dideoxy sequencing, single strand rescue with helper phage, and creation of nested deletions in the 
cloned sequence (Van Heeke, G. and S.M. Schuster (1989) J. Biol. Chem. 264:5503-5509). Whcai 
large quantities of KPP are needed, e.g. for the production of antibodies, vectors which direct high 
level expression of KPP may be used. For example, vectors containing the strong, inducible SP6 or 
T7 bacteriophage promoter may be used. 
25 Yeast expression systems may be used for production of KPP. A number of vectors 

containiug constitutive or inducible promoters, such as alpha factor, alcohol oxidase, and PGH 
promoters, may be used in the yeast Saccharomyces cerevisiae or Pichia pastoris. In addition, such 
vectors direct either the secretion or intracellular retention of expressed proteins and enable 
integration of foreign polynucleotide sequences into the host genome for stable propagation (Ausubel 
30 et al., suprai Bitter, G.A et al. (1987) Methods EozymoL 153:516-544; Scorer, CA. et al. (1994) 
Bio/Technology 12:181-184). 

Plant systems may also be used for expression of KPP. Transcription of polynucleotides 
encoding KPP maybe driven by viral promoters, e.g., the 35S and 19S promoters of CaMV used 
alone or in combination with the omega leader sequence from TMV (Takamatsu, N. (1987) EMBO J. 
35 6:307-3 1 1). Alternatively, plant promoters such as the small subunit of RUBISCO or heat shock 



54 



wo 2004/098539 PCTAJS2004/009215 

promoters maybe used (Coruzzi, G. et al. (1984) EMBO J. 3:1671-1680; Broglie, R. et al. (1984) 
Science 224:838-843; Winter, J. et al. (1991) Results Probl. Cell Differ. 17:85-105). These 
constructs can be introduced into plant cells by direct DNA transformation or pathogen-noediated 
transfection (The McGraw HiTl Yearbook of Science and Technolnpy (1992) McGraw Hill, New 
5 YorkNY,pp. 191-196). 

In mammalian cells, a nurriber of viral-based expression systems may be utilized. In cases 
where an adenovirus is xised as an expression vector, polynucleotides encoding KPP may be ligated 
into an adenovirus transcription/translation complex consisting of the late promoter and tripartite 
leader sequence. Insertion in a non-essential El or E3 region of the viral genome may be used to 

10 obtain infective virus which expresses KPP in host cells (Logan, J. and T. Shenk (1984) Proc. Natl. 
Acad. Sci. USA 81:3655-3659). In addition, transcription enhancers, such as the Rous sarcoma virus 
(RS V) enhancer, may be used to increase expression in noammalian host cells. S V40 or EB V-based 
vectors noay also be used for high-level protein expression. 

Hiunan artificial chromosomes (HACs) may also be enoployed to deliver larger fragments of 

15 DNA than can be contained in and expressed from a plasmid. HACs of about 6 kb to 10 Mb are 
constructed and delivered via conventional delivery methods (liposomes, polycationic amino • 
polymers, or vesicles) for therapeutic purposes (Harrington, J.J. et al. (1997) Nat. Genet 15:345-355). 

For long term production of recond>inant proteins in mammalian systems, stable expression 
of KPP in cell lines is preferred. For exanople, polynucleotides encoding KPP can be transformed 

20 into cell lines using expression vectors which may contain viral origins of replication and/or 

endogenous expression elements and a selectable marker gene on the same or on a separate vector. 
Following the introduction of the vector, cells maybe allowed to grow for about 1 to 2 days in 
enriched media before being switched to selective noedia. The purpose of the selectable marker is to 
confer resistance to a selective agent, and its presence allows growth and recovery of cells which 

25 successfoUy express the introduced sequences. Resistant clones of stably transformed cells may be 
propagated using tissue culture techniques appropriate to the cell type. 

Any number of selection systems may be used to recover transformed cell lines. These 
include, but are not limited to, the herpes sincplex virus thymidine kinase and adenine 
phosphoribosyltransferase genes, for use in tic and apr cells, respectively (Wigler, M. et al. (1977) 

30 Cell 11:223-232; Lowy, I. et al. (1980) Cell 22:817-823). Also, antimetabolite, antibiotic, or 

herbicide resistance can be used as the basis for selection. For exari^le, dhfr confers resistance to 
methotrexate; neo confers resistance to the aminoglycosides neomycin and G-418; and cds and pat 
confer resistance to chlorsulforon and phosphinotricin acetyltransferase, respectively (Wigler, M. et 
al. (1980) Proc. Naa Acad. Sci. USA 77:3567-3570; Colbere-Garapin, F. et al. (1981) J. Mol. Biol. 

35 150:1-14). Additional selectable genes have been described, e.g., trpB and hisD, which alter cellular 



55 



wo 2004/098539 



PCTAJS2004/009215 



requirements for metaboUtes (Hartman, S.C. and R.C. Muffigan (1988) Proc. Natl. Acad. ScL USA 
85:8047-8051). Visible naarkers, e.g., antJiocyanins, green fluorescent proteins (GFP; BD Qontech), 
P-giucuronidase and its substrate P-glucuronide, or hiciferase and its substrate luciferin may be used 
These markers can be used not only to identify transformants, but also to quantify the amount of 
5 transient or stable protein expression attributable to a specific vector system (Rhodes, C.A. (1995) 
Methods MoL Biol. 55:121-131). 

Although the presence/absence of imfker gene expression suggests that the gene of interest is 
also present, the presence and expression of the grae may need to be confirmed. For exan?)le, if the 
sequence encoding KPP is inserted within a marker gene sequence, transformed cells containing 
10 polynucleotides encoding KPP can be identified by the absence of marter gene function. 

Alternatively, a marker gene can be placed in tandem with a sequence encoding KPP under the 
control of a single promote. Expression of the marker gene in response to induction or selection 
usually indicates expression of the tandem gene as well. 

In general, host cells that contain the polynucleotide encoding KPP and that express KPP may 
15 be identified by a variety of procedures known to those of skill in the art These procedures include, 
but are not limited to, DNA-DNA or DNA-RNA hybridizations, PGR amplification, and protein 
bioassay or immunoassay techniques which include membrane, solution, or chip based technologies 
for the detection and/or quantification of nucleic acid or protein sequences. 

Imraunological methods for detecting and measuring the expression of KPP using either 
20 specific polyclonal or mumoclonal antibodies are known in the art. Exan^les of such techniques ' 
include mzyn^linked imnmnosorbent assays (ELISAs), radioimmunoassays (RIAs), and 
fluorescence activated ceU sorting (FACS). A two-site, monoclonal-based immunoassay utilizing 
monoclonal antibodies reactive to two non-interfering epitopes on KPP is preferred, but a competitive 
binding assay may be employed. These and other assays are well known ia the art (Hampton, R. et al. 
25 (1990) Serological Methods, a L aboratory Manual . APS Press, St. Paul MN, Sect. IV; CoHgan, J.E. et 
al. (1997) Current Protocols in TmmnTinlopy Greene Pub. Associates and Wiley-Interscience, New 
York NY; Pound, J.D. (1998) Immunochemical Protocols . Humana Press, Totowa NJ). 

A wide variety of labels and conjugation techniques are known by those skilled in the art and 
may be used in various nucleic acid and amino acid assays. Means for producing labeled 
30 hybridization or PGR probes for detecting sequences related to polynucleotides encoding KPP include 
oligolabeling, nick translation, end-labeling, or PGR amplification using a labeled nucleotide. 
Altematively, polynucleotides encoding KPP, or any fragments thereof, may be cloned into a vector 
for the production of an mRNA probe. Such vectors are known in the art, are commercially available, 
and may be used to syuthesize RNA probes in vitro by addition of an appropriate RNA polymerase 
J5 such as T7, T3, or SP6 and labeled nucleotides. These procedures may be conducted using a variety 



56 



wo 2004/098539 



PCT/US2004/009215 



of commercially avaflable kits, such as those provided by Amersham Biosciences, Promega (Madison 
WI), and US Biochemical. Sxiitable reporter molecules or labels which may be used for ease of 
detection include radionuclides, enzymes, fluorescent, cheiDfliurimescent, or chromogenic agents, as 
well as substrates, cofactors, inhibitors, magnetic particles, and the like. 

5 Host cells transformed with polynucleotides encoding KPP may be cultured under conditions 

suitable for the expression and recovery of the protein from cell culture. The protein produced by a 
transfomoed cell may be secreted or retained intracellularly depending on the seqijience and/or the 
vector used. As will be imderstood by those of skill in the art, expression vectors containing 
polynucleotides which encode KPP may be designed to contain signal sequences which direct 

10 secretion of KPP through a prokaryotic or eukaiyotic cell membrane. 

In addition, a host cell strain naay be chosen for its ability to modulate expression of the 
inserted polynucleotides or to process the expressed protein in the desired fashion. Such 
modifications of the polypeptide include, but are not limited to, acetylation, carbox^tion, 
gLycosylation, phosphorylation, lipidation, and acylation. Post-translational processing which cleaves 

15 a "prepro'' or "pro" form of the protein may also be used to specify protein targetii^, folding, and/or 
activity. Different host crfls i^ch have specific cellular machinery and characteristic Tne chani s nrig 
for post-translational. activities (e.g., CHO, HeLa, MDCK, HEK293, and WD8) are available from the 
American Type Culture Collection (ATCC, Manassas VA) and may be chosen to ensure the correct 
iDodification and processing of the foreign protein. 

20 In another enfl)odiment of the invention, natural, modified, or reconobinant polynucleotides 

encoding KPP may be ligated to a heterologous sequence resulting in translation of a fusion protein in 
any of the aforementioned host systems. For exaicple, a chimi^c KPP protein containing a 
het^ologous moiety that can be recognized by a commercially available antibody may facilitate the 
screening of peptide libraries for inhibitors of KPP activity. Heterologoxis protein and peptide 

15 moieties may also facilitate purification of fusion proteins using commercially available affinity 
matrices. Such moieties include, but are not liimted to, glutathione S-transferase (GST), maltose 
binding protdn (MBP), thioredoxin (Trx), calmodulin binding peptide (CBP), 6-His, FLAG, c-?nyc, 
and hemagglutinin (HA). GST, MBP, Trx, CBP, and 6-His enable purification of their cognate fusion 
proteins on immobilized gilutathione, maltose, phenylarsine oxide, calmodulin, and metal-chelate 

JO resins, respectively. FLAG, c-wiyc, and hemaggluti3Qin (HA) enable immunoaffinity purification of 
fusion protdns using commercially available monoclonal and polyclonal antibodies that specifically 
recognize these epitope tags. A fusion protein may also be engineered to contain a proteolytic 
cleavage site located between the KPP encoding sequence and the heterologous protein sequence, so 
that KPP may be cleaved away from the heterologous moiety following purification. Methods for 

i5 fusion protein expression and purification are discussed in Ausubel et al. (supra^ ch. 10 and 16). A 

57 



wo 2004/098539 



PCT/US2004/009215 



variety of connnercially available kits may also be used to facilitate expressioa and purification of 
fusion proteins. 

In another caiibodimOTt, synfliesis of radiolabeled KPP may be achieved in vitro using the 

TNT rabbit reticulocyte lysate or wheat germ extract system (Promega). These systems couple 
5 transcription and translation of protein-coding sequences operably associated with the T7, T3, or SP6 

promoters. Translation takes place in the presence of a radiolabeled amino acid precursor, for 

exanq>le, ^^S-mefhionine. 

KPP, fragments of KPP, or variants of KPP may be used to screen for compounds that 

specifically bind to KPP. One or more test compounds noay be screened for specific binding to KPP. 
to In various embodiments, 1, 2, 3, 4, 5, 10, 20, 50, 100, or 200 test compounds can be screened for 

specific binding to KPP. Examqples of test compounds can include antibodies, anticalins, 

oligonucleotides, proteins (e.g., ligands or receptors), or smaU molecules. 

In related embodiments, variants of KPP can be used to screen for binding of test confounds, 

such as antibodies, to KPP, a variant of KPP, or a combination of KPP and/or one or more variants 
15 KPP. In an embodiment, a variant of KPP can be used to screen for compounds that bind to a variant 

of KPP, but not to KPP having the exact sequence of a sequence of SEQ ID NO:l-43. KPP variants 

used to perfoim such scieening can have a range of about 50% to about 99% sequence identity to 

KPP, with various embodiments having 60%, 70%, 75%, 80%. 85%, 90%, and 95% sequence 

identity. 

20 In an embodiment, a compound identified in a screen for specific binding to KPP can be 

closely related to the natural ligand of KPP, e.g., a ligand or fragment thereof, a natural substrate, a 
structural or functional mimetic, or a natural binding partner (Coligan, I.E. et al. (1991) Current 
Protocols in Trnmrnoloyv l(2):Chapter 5). In another embodiment, the compound thus identified can 
be a natural ligand of a receptor KPP (Howard, A.D. et al. (2001) Trends Pharmacol. Sci.22: 132-140; 

25 Wise, A. et al. (2002) Dmg Discovery Today 7:235-246). 

In other embodiments, a confound identified in a screen for specific binding to KPP can be 
closely related to the natural receptor to which KPP binds, at least a fragment of the receptor, or a 
fragment of the receptor including aU or a portion of the ligand binding site or binding pocket. For 
example, the compound may be a receptor for KPP which is capable of propagating a signal, or a 

30 decoy receptor for KPP which is not capable of propagating a signal (Ashkenazi, A. and V.M. Divit 
(1999) Cutr. Opin. CeU Biol. 11:255-260; Mantovani, A. et al, (2001) Trends Immunol. 22:328-336). 
Hie compound can be rationally designed using known techniques. Examples of such techniques 
include those used to constract the compound etanercept (ENBREL; Amgen Inc., Thousand Oaks 
CA), which is efficacious for treating rheumatoid arthritis in humans. Etanercept is an engineered 

35 p75 tumor necrosis factor (TNF) receptor dimer linked to the Fc portion of human IgG^ (Taylor, P.C. 

5Z 



wo 2004/098539 PCT/US2004/009215 

et aL (2001) Curr. Opin. Immunol. 13:611-616). 

In one enibodiment, two or more antibodies having similar or, alternatively, different 
spedficities canbe scieened for specific binding to KPP, fragments of KPP. or variants of KPP. The 
binding specificity of the antibodies thus screened can thereby be selected to identify particular 
5 fragments or variants of KPP. In one embodiment, an antibody can be selected such that its binding 
specificity Jdlows for preferential identification of specific fragments or variants of KPP. In another 
embodiment, an antibody canbe selected such that its binding specificity allows foi preferential 
diagnosis of a specific disease or condition having increased, decreased, or otherwise abnormal 
production of KPP. 

10 In an einbodiment,anticalins can be screened for specific binding to KPP, fragments of KPP, 

or variants of KPP. Anticalins are Hgand-binding proteins fliat have been constmcted based on a 
hpocalin scaffold (Weiss. G.A. and H.B. Lowman (2000) Chem. Biol. 7:R177-R184; Skerra, A. 
(2001) J. Biotechnol. 74:257-275). The piotran architecture of Iqiocalins caninchide abeta-barrel 
having eight antiparallel beta-strands, which supports four loops at its open end. These loops form 
15 flie natural ligand-binding site of the Hpocalins. a site which canbe re-engmeered in vitrohy amino 
acid substitutions to impart novel binding specificities. The amino a<ad substitutions can be made 
using methods known in the art or described herein, and can include conservative substitutions (e.g., 
substitutions that do not alterbinding spedficity) or substitutions that modestly, moderately, or 
significantly alterbinding specificity. 
20 In one embodiment, screening for compounds which specifically bind to, stimulate, or inhibit 

KPP involves producing appropriate cells which express KPP, dflier as a secreted protein or on the 
cell membrane. Preferred cells can include cells from mammals, yeast, Drosophila, otE. coli. Cells 
expressing KPP or cell menibrane fractions which contain KPP are then contacted with a test 
compound and binding, stimulation, or inhibition of activity of eiflier KPP or the compound is 
. 25 analyzed. 

An assay may simply test binding of a test compound to flie polypeptide, whereinbinding is 
detected by a fluorophore, radioisotope, enzyme conjugate, or olher detectable label. For example, 
the assay may comprise the steps of combining at least one test compound with KPP, either in 
solution or affixed to a soUd support, and detecting the binding of KPP to the compound. 
30 Alternatively, the assay may detect or measure binding of a test compound in the presence of a 

labeled conq)etitor. Additionally, the assay may be carried out using cell-free preparations, chemical 
Ubraries, or natural product mixtures, and flie test compound(s) may be free in solution or affixed to a 
solid support 

An assay can be used to assess the ability of a compound to bind to its natural ligand and/or 
35 to inhibit the binding of its natural Ugand to its natural receptors. Exan^les of such assays include 



59 



wo 2004/098539 



PCT/US2004/009215 



radio-labeling assays such as those described in U.S. Patent No. 53^14,236 and U.S. Patent No. 
6,372,724. In a related embodiment, one or more amino acid substitutions can be introduced into a 
polypeptide conq)ound (such as a receptor) to improve or alter its ability to bind to its natural ligands 
(MatfLews, D J. and J.A. Wells. (1994) ChenL BioL 1:25-30). In another related embodiment, one or 
5 more amino acid substitutions can be introduced into a polypeptide compotmd (such as a ligand) to 
iniprove or alter its ability to bind to its natural receptors (Cunningham, B.C. and J.A. Wells (1991) 
Proc. Natl. Acad. Sci. USA 88:3407-3411; Lowman, H.B. et al. (1991) J. BioL Chem. 266:10982- 
10988). 

KPP, fragments of KPP, or variants of KPP may be used to screen for compounds that 

10 modulate the activity of KPP. Such compounds may include agonists, antagonists, or partial or 
inverse agonists. In one erobodiment, an assay is performed under conditions pemussive for KPP 
activity, wherdn KPP is condbined with at least one test compound, and the activity of KPP in the 
presence of a test compound is compared with the activity of KPP in the absence of the test 
cooGpound. A change in the activity of KPP in the presence of the test compound is indicative of a 

15 conotpound that modulates the activity of KPP. Altmiatively, a test coii5)ound is combined with an in 
vitro or cell-free system con5>rising KPP under conditions suitable for KPP activity, and the assay is 
performed. In either of these assays, a test compound which modulates the activity of KPP may do so 
iudirecfly and need not contie in direct contact with the test coinpound. At least one and up to a 
plurality of test connpounds may be screened. 

20 In another embodiiDent, polynucleotides encoding KPP or their mammalian homologs may be 

"knocked out" in an animal model system usiog homologous reconibination in enaibryonic stem (ES) 
cells. Such techniques are well known in the art and are useful for the generation of animal models of 
human disease (see, e.g., U.S. Patent No. 5,175,383 and U.S. Patent No. 5,767,337). For exanq)le, 
mouse ES cells, such as the mouse 129/SvJ cell line, are derived from the early mouse eidbryo and 

25 grown in culture. The ES cells are transformed with a vector containing the gene of interest disrupted 
by a marker gene, e.g., the neomycin phosphotransferase gene (neo; Capecchi, M.R. (1989) Science 
244:1288-1292). The vector integrates into the corresponding region of the host genome by 
homologous recornbination. Alternatively, homologous recordbination takes place using the Cre-loxP 
system to knockout a gene of interest in a tissue- or developmental stage-specific manner (Marth, J.D. 

30 (1996) Clin hivest. 97:1999-2002; Wagner, K.U. et al. (1997) Nucleic Acids Res. 25:4323-4330). 
Transformed ES cells are identified and microinjected into mouse cell blastocysts such as those from 
the C57BU6 mouse strain. The blastocysts are surgically transferred to pseudopregnant dams, and 
the resulting chimeric progeny are genotyped and bred to produce heterozygous or homozygous 
strains. Transgenic animals thus generated may he tested with potential therapeutic or toxic agents. 

35 Polynucleotides encoding KPP may also he manipulated in vitro in ES cells derived from 

60 



wo 2004/098539 PCT/US2004/009215 

Imman blastocysts. Htumn ES cells have the potential to differentiate into at least eight separate cefl 
lineages including endoderm, mesoderm, and ectodermal cell types. These cell lineages differentiate 
into, for example, neural cells, hematopoietic lineages, and cardiomyocytes (Thomson, J.A. et al. 
(1998) Science 282:1 145-1 147). 

Polynucleotides encoding KPP can also be used to create "knockin" humanized animals 
(pigs) or transgenic animals (mice or rats) to model huioan disease. With knockin technology, a 
region of a polynucleotide encoding KPP is injected into animal ES cells, and the injected sequence 
integrates into the animal cell genome. Transfomoed cells are injected into blastulae, and the 
blastulae are irqplanted as described above. Transgenic progeny or inbred lines are studied and 
treated with potential pharmaceutical agents to obtain information on treatment of a hunaan disease. 
Alternatively, a manmaal inbred to overexpress KPP, e.g., by secreting KPP in its irilk, may also 
serve as a convenient source of that protein (Janne, J. et al. (1998) Biotechnol. Aram. Rev. 4:55-74). 
THERAPEUTICS 

Chemical and structural similarity, e.g., in fee context of sequences and motifs, exists 
between regions of KPP and kinases and phosphatases. In addition, examples of tissues expressing 
KPP can be found in Table 6 and can also be found in Exanqde XI. Therefore, KPP appears to play a 
role in cardiovascular diseases, immune system disorders, neurological disorders, disorders affecting 
growth and development, lipid disorders, cell proliferative disorders, and cancers. In fee treatment of 
disorders associated wife increased KPP expression or activity, it is desirable to decrease fee 
expression or activity of KPP. In fee treatment of disorders associated wife decreased KPP 
expression or activity, it is desirable to inc^ase fee expression or activity of KPP. 

Therefore, in one embodiment, KPP or a fragment or derivative feereof may be admdnistered 
to a subject to treat or prevent a disorder associated wife decreased expression or activity of KPP. 
Exanq^les of such disorders include, but are not limited to, a cardiovascular disease such as 
arteriovenous fistula, afeerosclerosis, hypertension, vasculitis, Raynaud's disease, aneurysms, arterial 
dissections, varicose veins, thrombophlebitis and phlebothroxnbosis, vascular tumors, and 
complications of thronoibolysis, balloon angioplasty, vascular replacement, and coronary artery bypass 
graft surgery, congestive heart failure, ischenuc heart disease, angina pectoris, myocardial infarction, 
hypertensive heart disease, degenerative valvular heart disease, calcific aortic valve stenosis, 
congenitally bicuspid aortic valve, mitral annular calcification, nutral valve prolapse, rheumatic fever 
and rheumatic heart disease, infective endocarditis, nonbacterial tbroinbotic endocarditis, endocarditis 
of systemic lupus eryfeematosus, carcinoid heart disease, cardiomyopafey, myocarditis, pericarditis, 
neoplastic heart disease, congenital heart disease, and con5)lications of cardiac transplantation, 
congenital lung anomalies, atelectasis, puhnonary congestion and edema, pulmonary embolism, 
pulmonary hemonhage, pulmonary infarction, pulmonary hypertension, vascular sclerosis. 



61 



wo 2004/098539 PCT/US2004/009215 



obstructive pulmonary disease, restrictive pulmonary disease, chronic obstructive pulmonary disease, 
emphysema, chronic brondiitis, bronchial asthma, bronchiectasis, bacterial pneumonia, viral and 
nycoplasmal pneumonia, hing abscess, pulmonaiy tuberculosis, diffuse interstitial diseases, 
pneumoconioses, sarcoidosis, idiopathic pulmonary fibrosis, desquamative interstitial pneumonitis, 
5 hypersensitivity pneumonitis, pulmonary eosinophilia bnmchiohtis obUterans-organizing pneumonia, 
diffuse puhnonary hemorrhage syndromes, Goodpasture's syndromes, idiopathic puhnonary 
hemosidraosis, puhnonary involvement in collagen-vascular disorders, puhnonary alveolar 
piotsanosis, lung tumors, inflammatory and noninflammatory pleural effusions, pneumothorax, 
pleurjd tumors, drug-induced hmg disease, radiation-induced lung disease, and compHcations of lung 
10 transplantation; an innnune system disorder such as acquired immunodeficiency syndrome (AIDS), 
Addison's disease, adult respiratory distress syndrome, allergies, ankylosing spondylitis, anqrloidosis, 
anemia, aslimia, atherosclerosis, autoimmune hemolytic anemia, autoimmune thyroiditis, autoimmune 
polyendocrinopathy-candidiasis-eotodermal dystrophy (APECED), bronchitis, cholecystitis, contact 
dermatitis, Crohn's disease, atopic dermatitis, dermatomyositis, diabetes, meflitus, emphysema, 
15 episodic lymphopenia with lymphocytotoxins, aytiuoblastosis fetalis, erythema nodosum, atrophic 
gastritis, glomerulonephritis, Goodpasture's syndroipe, gout. Graves' disease, Hashimoto's 
thyroiditis, hypereosinophiUa, irritable bowel syndrome, multiple sclerosis, nyaslhaiia gravis, 
ngrocardial or pericardial inflammation, osteoarthritis, osteoporosis, pancreatitis, polymyositis, 
psoriasis, Reiter's syndrome, rheumatoid arthritis, scleroderma, Sj5gren's syndrome, systemic 
20 anaphylaxis, systemic hipus etyfljematosus, systranLc sclerosis, thronfcocytopenic purpura, ulcerative 
colitis, uvatis, Werner syndrome, conoplications of cancer, hemodialysis, and extracorporeal 
circulation, viral, bactraial, fungal, parasitic, protozoal, andhehninfliic mfections, and tiaunoa; a 
neurolo^cal disorder such as cpilqpsy, ischeamc cerebrovascular disease, stroke, cerobral neoplasms, 
Alzheimer's disease. Pick's disease, Huntington's disease, dementia, Parkinson's disease and other 
25 exfa-apyramidal disorders, amyotrophic lateral sclerosis and oflier motor neuron disorders, progressive 
neural muscular atirophy, retinitis pigmentosa, hereditary ataxias, multiple sclerosis and other 
demyelinating diseases, bacterial and viral memngitis, brain abscess, subdural etapyesm, epidural 
abscess, suppurative mtracranial thrombophlebitis, myelitis and radicuUtis, viral central nervous 
system disease, prion diseases including kuru, CreutzMdt-Jakob disease, and Gerstinannr 
30 Straussler-Scheinker syndrome, fatal familial insomnia, nutritional and metaboHc diseases of flie 
nervous system, neurofibromatosis, tubMous sdarosis, cerebeUoretinalhemangioblastomatosis, 
encephalotrigeminal syndrome, mental retardation and other developmental disorders of the caitral 
nervous system including Down syndrome, cerebral palsy, neuroskeletal disorders, autonomic ' 
nervous system disorders, cranial nerve disorders, spinal cord diseases, muscular dystrophy and other 
35 neuromuscular disorders, peripheral nervous system disorders, dermatomyositis and polymyositis. 



62 



wo 2004/098539 



PCT/US2004/009215 



iBherited, metabolic, endocrine, and toxic myopathies, niyasthenia gravis, periodic paralysis, mental 
disorders including mood, anxiety, and sduzophrenic disorders, seasonal affective disorder (SAD), 
akathesia, amnesia, catatonia, diabetic neuropathy, tardive dyskmesia, dystonias, paranoid psychoses, 
postherpetic neuralgia, Tourette*s disorder, progressive supranuclear palsy, corticobasal degeneration, 
5 and familial frontoten^oral denaentia; a disorder affecting growth and development sudlx as actinic 
keratosis, arteriosclerosis, atherosclerosis, bursitis, cirrhosis, hepatitis, noixed connective tissue 
disease (MCTD), myeloiSbrosis, paroxysmal nocturnal hemoglobinuria, polycythemia vera, psoriasis, 
primary thrombocq^emia, renal tubular acidosis, anjemia, Cushing's syndrome, achondroplastic 
dwarfism, Ducheme and Beck^ muscular dystrophy, qpilepsy, gonadal dysgenesis, WAGR 

10 syndrome (Wilms' tumor, aniridia, genitourinary abnormalities, and mental retardation), Smith- 
Magenis syndrome, myelodysplastic syndrome, hereditary mucoepithdial dysplasia, hereditary 
keratodermas, hereditary neuropathies such as Charcot-Marie-Tooth disease and neurofibromatosis, 
hypothyroidism, hydrocephalus, seizxure disorders sadh as Syndenham's chorea and cei^ral palsy, 
spina bifida, anencephaly, craniorachischisis, congenital glaucoma, cataract, and sensorineural 

15 hearing loss; a lipid disorder such as fatty liver, cholestasis, primary biliary cirrhosis, carnitine 
deficiency, carnitine palnutoyltransferase deficiency, myoadenylate deaminase deficiency, 
hypertriglyc^deoua, lipid storage disorders such Fabry's disease, Gaucher's disease, Niemann- 
Kck's disease, metachromatic leukodystrophy, adrenoleukodystrophy, GM2 gangliosidosis, and 
ceroid lipofuscinosis, abetalipoproteineoua, Tangier disease, hyperlipoproteinemia, diabetes mellitus, 

20 lipodystrophy, lipomatoses, acute panniculitis, disseminated fat necrosis, adiposis dolorosa, lipoid 
adrenal hyperplasia, nmimal change disease, lipomas, atherosclerosis, hypercholesterolemia, 
hypercholesterolenda with hypertriglyceridemia, primary hypoalphalipoproteinemia, hypothyroidism, 
renal disease, liver disease, lecithin:cholesterol acyltransferase deficiency, cerebrotendinous 
xanthomatosis, sitosterolemia, hypocholesterolemia, Tay-Sachs disease, SandhofFs disease, 

25 hyparlipidemia, hyp^hpemia, lipid inyopathies, and obesity; and a cell proliferative disorder such as 
actinic keratosis, arteriosclerosis, atherosclerosis, bursitis, cirrhosis, hepatitis, imxed connective 
tissue disease (MCTD), ngrelofibrosis, paroxysmal nocturnal hemoglobinuria, polycytiiemia vera, 
psoriasis, primary thrombocyfhemia, and cancers including adenocarcinoma, leukemia, lymphoma, 
melanoma, myelonoa, sarcoma, teratocardnoma, and, in particxilar, cancers of the adrenal gland, 

30 bladder, bone, bone marrow, brain, breast, cervix, colon, gall bladder, ganglia, gastrointestinal tract, 
heart, kidney, liver, lung, muscle, ovary, pancreas, parathyroid, penis, prostate, salivary glands, skin, 
spleen, testis, thymus, thyroid, uterus, leukenias such as multiple myeloma, and lymphomas such as 
Hodgkin's disease. 

In another erribodiment, a vector capable of expressing KPP or a fragment or derivative 
55 thereof may be administered to a subject to treat or prevent a disorder associated with decreased 

63 



wo 2004/098539 



PCT/US2004/009215 



expression or activity of KPP including, but not limited to, those described above. 

Isx a further enibodiment, a composition con5)rising a substantially purified KPP in 
conjunction with a suitable pharmaceutical carrier may be administered to a subject to treat or prevent 
a disorder associated with decreased expression or activity of KPP including, but not limited to, those 
provided above. 

In still another enibodiment, an agonist which modulates the activity of KPP may be 
adndnistered to a subject to treat or prevent a disorder associated with decreased expression or 
activity of KPP including, but not Ihmted to, those listed above. 

In a further embodiment, an antagonist of KPP may be adnanistered to a subject to treat or 
prevent a disorder associated vnOx increased expression or activity of KPP. Examples of such 
disorders include, but are not linnted to, those cardiovascular diseases, immune system disorders, 
neurological disorders, disorders affecting growth and development, lipid disorders, cell proliferative 
disorders, and cancers described above. In one aspect, an antibody which specifically binds KPP may 
be used direcfly as an antagonist or indirecfly as a targeting or delivery mechanism for bringing a 
pharmaceutical agent to cells or tissues ^ch express KPP- 

In an additional embodiment, a vector expressing the complement of the polynucleotide 
encoding KPP may be administered to a subject to treat or preveot a disorder associated with 
increased expression or activity of KPP including, but not linited to, those described above. 

In other enibodiments, any protdbi, agonist, antagonist, antibody, conplementary sequence, 
or vector erribodiments may be adnmnistered in condbination with other appropriate therapeutic 
agents. Selection of the appropriate agents for use in combination therapy may be made by one of 
ordmary skill in the art, accordmg to conventional pharmaceutical principles. The combination of 
therapeutic agents may act synergistically to effect the treatment or prevention of the various 
disorders described above. Using this approach, one may be able to achieve therapeutic efficacy with 
lower dosages of each agent, thus reducing the potential for adverse side effects. 

An antagonist of KPP may be produced usmg methods which are genially known in the art 
In particular, purified KPP may be used to produce antibodies or to screen libraries of pharmaceutical 
agents to identify those which specificaUy bind KPP. Antibodies to KPP may also be generated using 
methods that are well known in the art. Such antibodies may include, but are not limited to, 
polyclonal, monoclonal, chimeric, and single chain antibodies. Fab fragments, and fragments 
produced by a Fab expression library. In an embodiment, neutralizing antibodies (i.e., those which 
inhibit dimer formation) can be used therapeutically. Single chain antibodies (e.g., from camels or 
llamas) may be potent enzyme inhibitors and may have application in the design of peptide nmmetics, 
and in the development of immuno-adsorbents and biosensors (Muyldermans, S. (2001) J. Biotechnol. 
74:277-302). 



64 



wo 2004/098539 



PCT/US2004/009215 



For the production of antibodies, various hosts includiog goats, rabbits, rats, mice, camels, 
dromedaries, llamas, humans, and others may be irmmmized by iryection with KPP or with any 
fragment or oligopeptide thereof ^ch has immunogenic properties. Depending on the host species, 
various adjuvants may be used to increase immunological response. Such adjuvants include, bnt are 
5 not linrited to, Freund's, nnneral gels such as aluminumhydroxide, and surface active substances such 
as lysoledthin, pluronic polyols, polyanions, peptides, oil emulsions, KLH, and dinitrophenol. 
Among adjuvants used in humans, BCG (bacilli Cahnette-Guerin) and Corynebacterium parvum are 
especially preferable. 

It is preferred that the oligopeptides, peptides, or fragments used to induce antibodies to KPP 
10 have an ammo acid sequence consisting of at least about 5 amino acids, and generally will consist of 
at least about 10 amino acids. It is also preferable that these oligopeptides, peptides, or fragments are 
substantially identical to a portion of the amino acid sequence of the natural protein. Short stretches 
of KPP amino acids may be fused with those of another protein, such as KLH, and antibodies to the 
chimeric molecule may be produced. 

15 Monoclonal antibodies to KPP may be prepared using any technique which provides for the 

production of antibody molecules by continuous ceU lines in culture. These include, but are not 
limited tOy the hybridoma technique, the human B-cell hybridoma technique, and the EB V-hybridoma 
technique (Kohler, G. et al. (1975) Nature 256:495-497; Kozbor, D. et al. (1985) J. Immunol. 
Methods 81:31-42; Cote, R.J. et al. (1983) Proc. Nati. Acad. Sci. USA 80:2026-2030; Cole, S.P. et al. 

20 (1984) Mol. Cen Biol. 62:109-120). 

In addition, techniques developed for the production of "chimeric antibodies," such as the 
splicing of mouse antibody genes to human antibody genes to obtain a molecule with appropriate 
antigen specificity and biological activity, can be used (Morrison, S.L. et al. (1984) Proc. Nati. Acad. 
Sci. USA 81:6851-6855; Neuberger, M.S. et al. (1984) Nature 312:604-608; Takeda, S. et al. (1985) 

25 Nature 314:452-454). Alternatively, techniques described for the production of single chain 

antibodies maybe adapted, using methods known in the art, to produce KPP-specific single chain 
antibodies. Antibodies with related specificity, but of distinct idiotypic composition, may be 
generated by chain shuffling from random combinatorial immunoglobulin libraries (Burton, D.R. 
(1991) Proc. Natl. Acad. Sci. USA 88: 10134-10137). 

30 Antibodies may also be produced by inducing in vivo production in the lymphocyte 

population or by screening imraunogjobulin libraries or panels of highly specific binding reagents as 
disclosed in the literature (Orlandi, R. et al. (1989) Proc. Nati. Acad, Sci. USA 86:3833-3837; Winter, 
G. et al. (1991) Nature 349:293-299). 

Antibody fragments which contain specific binding sites for KPP may also be generated. For 

35 example, such fragments include, but are not limited to, F(ab*)2 firagments produced by pepsin 



65 



wo 2004/098539 



PCT/US2004/009215 



digestion of the antibody mDlecule and Fab fragments generated by reducing the disulfide bridges of 
the F(ab^2 fragments. Alternatively, Fab expression libraries may be constructed to allow rapid and 
easy identification of monoclonal Fab fragments with the desired specificity (Huse, W.D. et aL (1989) 
Sdeace 246:1275-1281). 

Various immunoassays may be used for screening to identity antibodies having the desired 
specificity. Numerous protocols for con^ietitive binding or immunoradiometric assays using either 
polyclonal or monoclonal antibodies with established specificities are well known in the art Such 
immunoassays typicafly involve the measurement of con5)lex formation between KPP and its specific 
antibody. A two-site, monoclonal-based immunoassay utilizing monoclonal antibodies reactive to 
two non-interfering KPP epitopes is generally used, but a con5)etitive binding assay may also be 
employed (Pound, supra). 

Various methods such as Scatchard analysis in conjunction wth radioimmunoassay 
techniques may be used to assess the affinity of antibodies for KPP. Affinity is expressed as an 
association constant, K^, which is defined as the molar concentration of KPP-antibody conq>lex 
divided by the molar concentrations of free antigen and firc antibody under equilibrium conditions. 
The K^ determined for a preparation of polyclonal antibodies, which are heterogeneous m their 
affinities for multiple KPP epitopes, represents the av^ge affinity, or avidity, of the antibodies for . 
KPP. The K. determined for a preparation of monoclonal antibodies, which are monospecific for a 
particular KPP epitope, represents a true measure of affinity. High-affinity antibody preparations 
with K^ ranging from about 10^ to 10" L/mole are preferred for use in unraunoassays in which the 
KPP-antibody covoplejL jmst withstand rigorous manipulations. Low-affinity antibody preparations 
with K^ ranging from about 10^ to 10^ L/mole are preferred for use in imraunopurification and sinrilar 
procedures which ultimately require dissociation of KPP, preferably in active form, from the antibody 
(Catty, D, (1988) Antibodies. Volumft T- A Practical Ap proarfh IRL Press, Washington DC; LiddeU, 
J,E. and A. Ciyer (1991) A Practical Guide to Monoclonal Antibodies , John Wiley & Sons, New 
York NY). 

The titer and avidity of polyclonal antibody preparations may be further evaluated to 
detemme the quality and suitability of such preparations for certain downstream applications. For 
example, a polyclonal antibody preparation containing at least 1-2 mg specific antibody/nJ, 
preferably 5-10 mg specific antibody/ml, is generally employed in procedures requiring precipitation 
of KPP-antibo<ty complexes. Procedures for evaluating antibody specificity, titer, and avidity, and 
guidelines for antibody quality and usage in various applications, are generally available (Catty, 
supra; Coligan et al., supra). 

In another embodiment of the invention, polynucleotides encoding KPP, or any fragment or 
complement thereof, may be used for therapeutic purposes. In one aspect, modifications of gene 



66 



wo 2004/098539 PCT/US2004/009215 

expression can be achieved by designing complementary sequences or antisense molecules (DNA, 
RNA, PNA, or modified oligonucleotides) to the coding or regulatory re^ons of the geaie encoding 
KPP. Such technology is well known in the art, and antisense oligonucleotides or largw fragments 
can be designed from various locations along the coding or control regions of sequraices eaicoding 
5 KPP (Agrawal, S., ed. (1996) Antisense T herapeutics . Humana Press, Totawa NJ). 

In flierapeutic use, any gene deUvery system suitable for introduction of the antisense 
sequences into appropriate taiget cells can be used. Antisense sequences can be deUvered 
intracellulariy in flie form of an expression plasmid which, upon transcription, produces a sequence 
conaplementary to at least a portion of tihe ceUular sequence encoding flie target protein (Slater, J.E. et 
10 al. (1998) J. Allei:gy Clin. Immunol. 102:469-475; Scanlon, K J. et al. (1995) FASEB J. 9:1288- 
1296). Antisense sequences can also be introduced intracellulariy through flie use of viral vectors, 
such as retrovirus and adeno-assodated virus vectors (Miller, A.D. (1990) Blood 76:271-278; 
Ausubel et al., supra; Uckert, W. and W. Walflier (1994) Pharmacol. Ther. 63:323-347). Other gene 
deUvery mechanisms include liposome-derived systems, artificial viral envelopes, and other systems 
15 knowninthe art (Rossi, JJ. (1995) Br. Med. Bull. 51:217-225; Boado, R.J. et al. (1998) J. PhamL 
Sci. 87:1308-1315; Morris, M.C. et al. (1997) Nucleic Acids Res. 25:2730-2736). 

In another embodiment of the invention, polynucleotides encoding KPP may be used for 
somatic or gamUne gene therapy. Gene flierapy may be paf ormed to (i) correct a genetic deficiency 
(e.g., in the cases of scveset combined immunodeficiency (SCID)-X1 disease characterized by X- 
20 linked inheritance (Cavazzana-Calvo, M. et al. (2000) Science 288:669-672), severe combined 
imnHinod^caencys^ syndrome associated with an inherited adjsoosine deaminase (ADA) deficiency 
(Blaese, R.M. et aL (1995) Sdence 270:475-480; Bordignon, C. et al. (1995) Science 270:470-475), 
cystic fibrosis (Zabna:, J. et al. (1993) CeH 75:207-216; Crystal, R.G. et al. (1995) Hum. Gene 
Therapy 6:643-666; Crystal. R.G. et al. (1995) Hum. Gene Therapy 6:667-703), thalassamias. familial 
25 hyperchDlestearolcmia, and hemophiHa resulting from Factor VXH or Factor IX deficiencies (Crystal, 
R.G. (1995) Science 270:404-410; Verma, I.M. andN. Somia (1997) Nature 389:239-242)), (ii) 
express a conditionally lethal gene product (e.g., in the case of cancers which result from unregulated 
cell proliferation), or (iii) express a protein which affords protection against intracellular parasites 
(e g., against human retroviruses, such as human immunodeficiency virus (HIV) (Baltunore, D. 
30 (1988) Nature 335:395-396; Poeschla, E. et al. (1996) Proc. Natl. Acad. Sci. USA 93:11395-11399), 
hepatitis B or C virus (HBV, HCV); firngal parasites, such as CmuUda albicans and Paracoccidioides 
brasiliensis; and protozoan parasites such as Plasmodium falciparum and Trypajiosoma cruzi). In the 
case where a genetic deficiency in KPP expression or regulation causes disease, the expression of 
KPP from an appropriate population of transduced cells may alleviate the clinical manifestations 
J5 caused by the genetic deficiency. 



67 



y/O 2004/098539 



PCT/US2004/009215 



In a fuiflier eitbodiiDent of the invention, diseases or disordors cansed by deficiencies in KPP 
are treated by constructing mammalian expression vectors encoding EZPP and introducing these 
vectors by mechanical means into KPP-deficient cells. Mechanical transfer technologies for use with 
cells m vivo or ex vitro include (i) direct DNA microinjection into individual cells, (ii) ballistic gold 
S particle delivery, (iii) liposome-mediated transfection, (iv) receptor-mediated gene transfer, and (v) 
theuseof DNAtransposons (Morgan, R.A- and W.F. Anderson (1993) Annu. Rev. Biochem. 62:191- 
217; Ivies, Z. (1997) Cell 91:501-510; Boulay, J.-L. and H. R&ipon (1998) Curr. Opin. Biotechnol. 
9:445-450). 

Expression vectors that may be effective for the expression of KPP include, but are not 

10 limited to, the PCDNA 3.1, EPITAG, PRCCMV2, PREP, PVAX, PCR2-TOPOTA vectors 

(Invitrogen, Carlsbad CA), PCMV-SCRIPT, PCMV-TAG, PEGSmPERV (Stratagene, La JoUa CA), 
and PTET-OFF, PTBT-ON, PTRE2, PTRE2.LUC, PTK-HYG (BD Clontech, Palo Alto CA). KPP 
may be expressed using (i) a constitutively active promoter, (e.g., from cytomegalovirus (CMV), 
Rous sarcoma virus (RS V), SV40 virus, thymidine kinase (TK), or P-actin genes), (ii) an inducible 

15 pronouoter (e.g., the tetracycline-regulated promoter (Gosseo, M. and H. Bujard (1992) Pioc. Natl. 
Acad. Sci. USA 89:5547-5551; Gossen, M. et al. (1995) Science 268:1766-1769; Rossi, F.M.V. and 
H.M. Blau (1998) Curr. Opin. Biotechnol. 9:451-456), commercially available in the T-REX plasrdd 
(Invitrogen)); the ecdysone-inducible promoter (available in the plasmids PVGRXR and PIND; 
Invitrogen); the FK506/rapamycin inducible promoter; or the RU486/mifepristone inducible promoter 

20 (Rossi, F.M.V. and H.M. Blau, supra)), or (iii) a tissue-specific promoter or the native promoter of 
the endogenous gene encoding KPP from a normal individual. 

Commercially available liposome transformation kits (e.g., the PERFECT LIPID 
TRANSFECTION KIT, available from Invitrogen) allow one with ordinary skill in the art to deliver 
polynucleotides to target cells in culture and require minimal effort to optimize experimental 

25 parameters. In the alternative, transformation is performed using the calcium phosphate method 
(Graham, F.L. and AJ. Eb (1973) Virology 52:456-467), or by electroporation (Neumann, E. et al. 
(1982)EMBO J. 1:841-845). The introduction of DNA to priDaaryceUs requires modification of 
these standardized mammalian transfection protocols. 

In another embodiment of the invention, diseases or disorders caused by genetic defects with 

30 respect to KIPP expression are treated by constractiug a retrovirus vector consisting of (i) the 

polynucleotide encoding KPP under the control of an independent promoter or the retrovirus long 
terminal repeat (LTR) promoter, (ii) appropriate RNA packaging signals, and (iii) a Rev-responsive 
element (RRE) along with additional retrovirus cw-acting RNA sequences and coding sequences 
required for efficient vector propagation. Retrovirus vectors (e.g., PFB and PFBNEO) are 

35 commercially available (Stratagene) and are based on published data (Riviere, L et al. (1995) Proc. 

68 



wo 2004/098539 



PCTAJS2004/009215 



Natl. Acad. Sci. USA 92:6733-6737), incorporated by reference herein. The vector is propagated in 
an appropriate vector producing cell line (VPCL) that expresses an envelope gene with a tropism for 
receptors on the target cells or a promiscuous envelope protein such as VS Vg (Armentano, D. et al. 
(1987) J. Virol. 61:1647-1650; Bender, M.A et al. (1987) J, Virol. 61:1639-1646; Adam, M.A. and 
AD. Miller (1988) J. Virol. 62:3802-3806; Dull, T. et al. (1998) J. ViroL 72:8463-8471; Zufferey, R. 
et al. (1998) J. ViroL 72:9873-9880), U.S. Patent No. 5,910,434 to Rigg ("Method for obtainmg 
retrowus packaging cell lines producing high transducing efficiency retroviral supernatant") 
discloses a method for obtaining retrovirus packaging cell lines and is hereby incorporated by 
reference. Propagation of retrovirus vectors, transduction of a population of cefls (e.g., CD4* T- 
cdls), and the return of transduced cells to a patient are procedures well known to persons skilled in 
the art of gene therapy and have been well documented (Ranga, U. et al. (1997) J. Viiol. 71:7020- 
7029; Bauer, G. et al. (1997) Blood 89:2259-2267; Bonyhadi, M.L. (1997) J. Virol. 71:4707-4716; 
Ranga, U. et al. (1998) Proc. Natl. Acad. Sci. USA 95:1201-1206; Su, L. (1997) Blood 89:2283- 
2290). 

Li an eoibodinieiit, an adenovirus-based gene therapy delivery system is used to deliver 
polynucleotides encoding KPP to cells which have one or naore genetic abnormalities with respect to 
the expression of KPP. The construction and packaging of adenoviras-based vectors are well known 
to those with ordinary skill in the art Replication defective adenovirus vectors have proven to be 
versatile for importing genes encoding immunoregulatory proteins into intact islets in the pancreas 
(Csete, M.E. et al. (1995) Transplantation 27:263-268). Potentiafly useful adenoviral vectors are 
described in U.S. Patent No. 5,707,618 to Armentano ("Admovirus vectors for gene therapy"), 
hereby incorporated by reference. For adenoviral vectors, see also Antinozzi, P.A. et aL (1999; Annu. 
Rev. Nutr. 19:511-544) and Verma, LM. and N. Somia (1997; Nature 18:389:239-242). 

In another ^bodiment, a herpes-based, gene therapy delivery system is used to deliver 
polynucleotides encoding KPP to target cells which have one or more genetic abnormalities with 
respect to the expression of KPP. The use of herpes smplex virus (HS V)-based vectors may be 
especially valuable for introducing KPP to cells of the central nervous system, for which HS V has a 
tropism. The constmction and packaging of herpes-based vectors are well known to those with 
ordinary skill in the art A replication-competent herpes simplex virus (HS V) type 1-based vector has 
been used to deliver a reporter gene to the eyes of primates (Liu, pC. et al. (1999) Exp. Eye Res. 
169:385-395). The constmction of a HS V-1 virus vector has also been disclosed in detail in U.S. 
Patent No, 5,804,413 to DeLuca ("Herpes simplex vmis strains for gene transfer"), which is hereby 
incorporated by rrference. U.S. Patent No. 5,804,413 teaches the use of recordbinant HSV d92 which 
consists of a genome containing at least one exogenous gene to be transferred to a cell under the 
control of the appropriate promoter for purposes including human gene therapy. Also taught by this 



69 



wo 2004/098539 



PCT/US2004/009215 



patent are the constnictioii and use of recombinant HSV strains deleted for ICP4, ICP27 and ICP22. 
For HSV vectors, see also Goins, W.R et al. (1999; J. Virol. 73:519-532) and Xu, H. et al. (1994; 
Dev. Biol. 163:152-161). The manipulation of cloned herpesvirus sequences, the generation of 
reconobinant virus following the transiection of multiple plasmids containing diffident segments of 
5 the large herpesvirus genomes, the growth and propagation of herpesvirus, and the infection of cells 
with herpesvirus are techniques well known to those of ordinaiy skillintheart 

In another embodiment, an alphavirus (positive, single-stranded RNA virus) vector is used to 
deliver polynucleotides encoding KPP to target cells. The biology of the prototypic alphavirus, 
Sendild Forest Virus (SFV), has been studied extensively and gene transfer vectors have bem based 

10 on the SFV genome (Garoff, H. and K.-J. li (1998) Curr. Opm. Biotechnol. 9:464-469), During 
alphavirus RNA replication, a subgmomic RNA is generated that normally encodes the viral capsid 
proteins. This subgenomic RNA replicates to higher levels than the full length genonnc RNA, 
resulting in the overproduction of capsid proteins relative to the viral proteins with enzymatic activity 
(e.g,, protease and polymerase). Sinrilarly, inserting the codmg sequence for KPP into the alphavirus 

15 gmome in place of the capsid-coding region results in the production of a large nuidb^ of KPP- 
coding RNAs and the synthesis of high levels of KPP in vector transduced cells. While alphavirus 
infection is typically associated with cell lysis within a few days, the ability to establish a persistent 
infection in hamster normal kidney cells (BHK-21) with a variant of Sindbis virus (SIN) indicates that 
the lytic r^lication of alphaviruses can be alt^^ to suit the needs of the gene therapy application 

20 (Dryga, S.A. et al. (1997) Virology 228:74-83). The wide host range of alphaviruses will allow the 
introduction of KPP into a variety of cell types. The specific transduction of a subset of cells in a 
popidation may require the sorting of cells prior to transduction. The methods of manipulating 
infectious cDNA clones of alphaviruses, performing alphavirus cDNA and RNA transfections, and 
p^orming alphavirus infections, are well known to those with ordinary sldll in the art 

25 Oligonucleotides derived from the transcription initiation site, e.g., between about positions 

-10 and +10 from the start site, may also be enq>loyed to inhibit gene expression. Similarly, 
inhibition can be achieved using triple helix base-pairing methodology. Triple helix pairing is usefid 
because it causes inhibition of the ability of the double helix to open sufficiently for the binding of 
polymerases, transcription factors, or regulatory molecules. Recent therapeutic advances using 

30 triplex DNA have been described in the literature (Gee, J.E. et al. (1994) in Huber, B.E. and B.L Carr, 
Molecular a nd TTnTrmn ologic Approaches . Futura Publishing, Mt. Kisco NY, pp. 163-177). A 
coiDplementary sequence or antisense molecule may also be designed to block translation of mRNA 
by preventing the transcript from binding to ribosomes. 

Ribozymes, enzymatic RNA molecules, may also be used to catalyze the specific cleavage of 

35 RNA The mechanism of ribozyme action involves sequence-specific hybridization of the ribozyme 

70 



wo 2004/098539 



PCT/US2004/009215 



molecule to coir5)lementary target RNA, followed by endonucleolytic cleavage. For exaiEple, 
eBgineered hammerhead motif ribozyme molecules may specifically and eMciently catalyze 
endonucleolytic cleavage of RNA molecules encoding KPP. 

Specific ribozyme cleavage sites within any potential RNA target are initially identified by 
5 scanning the target molecule for ribozyme cleavage sites, including the following sequences: GUA, 
GUU, and GUC. Once identified, short RNA sequences of between 15 and 20 ribonucleotides, 
corresponding to the region of the target gene containing the cleavage site, may be evaluated for 
secondary structural features which may render the oligonucleotide inoperable. The suitability of 
candidate targets may also be evaluated by testing accessibilily to hybridization with con^lemisntary 

10 oligonucleotides using ribonuclease protection assays. 

Con^lenoentary ribonucleic acid noolecules and ribozymes may be prepared by any method 
known in the art for the synthesis of nucleic acid molecules. These include techniques for chemically 
synthesizing oligonucleotides such as solid phase phosphoramidite chendcal synthesis. Alternatively, 
RNA molecules noay be gen^ated by in vitro and in vivo transcription of DNA molecules encoding 

15 KPP. Such DNA sequmces may be incorporated into a wide variety of vectors with suitable RNA 
polymerase promoters such as T7 or SP6. Alternatively, these cDNA constructs that synthesize 
con^lementary RNA, constitutively or inducibly, can be introduced into cell lines, cells, or tissues. 

RNA molecules may be modified to increase intraceUular stability and half-life. Possible 
modifications include, but are not lindted to, the addition of flanking sequences at the S' and/or 3' 

20 ends of the molecule, or the use of phosphorothioate or 2 ' O-methyl rather than phosphodiesterase 
linkages within the backbone of the molecule. This concept is inherent in the production of PNAs 
and can be extended in all of these molecules by the inclusion of nontraditional bases such as inosine, 
queosine, and wybutosine, as well as acetyl-, methyl-, thio-, and similarly modified fonns of adenine, 
cytosine, guanine, thymine, and uracil which are not as easily recognized by endogenous 

25 endonucleases. 

In other ernbodiments of the invention, the expression of one or more selected 
polynucleotides of the present invention can be altered, inhibited, decreased, or silenced using RNA 
interference (RNAi) or post-transcriptional gene silencing (PTGS) methods known in the art. RNAi 
is a post-transcriptional mode of gene silencing in which double-stranded RNA (dsRNA) introduced 

30 into a targeted cell specifically suppresses the expression of the honoologous gene (i.e., the gene 
bearing the sequence complementary to the dsRNA). This effectively knocks out or substantially 
reduces the expression of the targeted gene. PTGS can also be accomplished by use of DNA or DNA 
firagments as well. RNAi methods are described by Fire, A. et al. (1998; Nature 391:806-811) and 
Gura, T. (2000; Nature 404:804-808). PTGS can also be initiated by introduction of a 

35 conq>lementaiy segment of DNA into the selected tissue using gene delivery and/or viral vector 

71 



wo 2004/098539 PCT/US2004/009215 

delivery methods described herein or known in the art 

RNAi can be induced in manamalian cells by the use of small interfering UNA also known as 
siRNA siRNA are shorter segments of dsRNA (typically about 21 to 23 nucleotides in length) iJiat 
result in vivo from cleavage of introduced dsRNAby the action of an endogenous ribonuclease. 
SiRNA appear to be iJxe mediators of the RNAi effect in mammals. The most effective siRNAs 
appeartobe21 nucleotide dsRNAs with 2 nucleotide 3- oveAangs. Hie use of siRNA for inducing 
RNAi in mammaKan cells is described by Elbashir, S.M. et aL (2001 ; Nature 41 1 :494-498). 

siRNA can be generated indirecfly by introduction of dsRNA into the targeted ceH. 
Altetnativdy, siRNA can be synthesized direcfly and introduced into a cell by transfection methods 
and agents described hraein or known in the ait (such as Uposome-mediated transfection, viral vector 
mefliods, or other polynucleotide ddiveiy/introductoiy mefliods). Suitable siRNAs canbe selected 
by examining a transcript of flie target polynucleotide (e.g.. mRNA) for nucleotide sequences 
downstream from die AUG start codon and recording flie occurrence of each nucleotide and the 3' 
adjacent 19 to 23 nucleotides as potential siRNA target sites, witii sequences having a 21 nucleotide 
length bdng preferred. Regions to be avoided for target siRNA sites inchide the 5' and 3' unlranslated 
regions (UTRs) and regions near the start codon (within 75 bases), as these may be richer in 
regulatory protdn binding sites. UTR-binding proteins and/or translation initiation complexes may 
interfere with bindmg of flie siRNP endonuclease complex. The selected target sites for siRNA can 
thenbe compared to flie appropriate genome database (e.g., human, etc.) using BLAST or other 
sequence comparison algorithms known in flie art. Target sequences with significant homology to 
oflMT coding sequoices canbe eliminated from consideration. The selected siRNAs can be produced 
by chemical synthesis methods known in flie art or by m vitro transcription using commercially 
available mefliods and kits such as flie SILENCER siRNA construction kit (Ambion, Austin TX). 

In alternative enfljodiments, long-term gene silencing and/or RNAi effects can be induced in 
sdected tissue using expression vectors fliat continuously express siRNA. This can be accomphshed 
using expression vectors fliat are engineered to express hairpin RNAs (shRNAs) using methods 
known in flie art (see. e.g., Brummelkamp, T.R. et al. (2002) Science 296:550-553; and Paddison. P.J. 
et al. (2002) Genes Dev. 16:948-958). In these and related embodiments. shRNAs can be deUvered to 
target cells using expression vectors known in the art. An example of a suitable expression vector for 
ddiveiy of siRNAis flie PSILENCER1.0-U6 (circular) plasmid (Anibion). Once dehvered to flie 
target tissue, shRNAs are processed in vivo into siRNA-like molecules capable of carrying out g«aie- 
specific silencing. 

Li various embodiments, flie expression levels of genes targeted by RNAi or PTGS mefliods 
can be determined by assays for mRNA and/or protein analysis. Expression levels of flie niRNA of a 
targeted gene canbe determined, for example, by norihem analysis mefliods using flie 



72 



wo 2004/098539 



PCT/US2004/009215 



NORTHERNMAX-GLY kit (Ambion); by mcroarray methods; by PGR methods; by real time PGR 
methods; and by other RNA/polynucleotide assays known in the art or described herein. Expression 
levels of the protein encoded by the targeted gene can be determmed, for example, by nricroairay 
noethods; by polyaciylamide gel electrophoresis; and by Westem analysis using standard techniques 
known in the art. 

An additional embodiment of the invention encompasses a method for screening for a 
con5)ound which is effective in altering expression of a polynucleotide encoding KPP. Compounds 
which may be effective in altering expression of a specific polynucleotide may include, but are not 
limited to, oligonucleotides, antisense oligonucleotides, triple helix-forming oligonucleotides, 
transcription factors and other polypeptide transcriptional regulators, and non-macromolecular 
chenucal entities which are capable of interacting with specific polynucleotide sequences. Effective 
compounds may alter polynucleotide expression by acting as either inhibitors or promoters of 
polynucleotide expression. Thus, in the treatment of disorders associated with increased KPP 
expression or activity, a compound which specifically inhibits expression of the polynucleotide 
encodmg KPP naay be therapeutically useful, and in the treatment of disorders associated with 
decreased KPP expression or activity, a con^wund which specifically promotes expression of the 
polynucleotide encoding KPP may be therapeutically useful. 

In various embodiments, one or more test compounds may be screened for effectiveness in 
altering expression of a specific polynucleotide. A test compound may be obtained by any method 
commonly known in the art, including chemical modification of a compoimd known to be effective in 
altering polynucleotide expression; selection from an existing, commercially-available or proprietary 
library of naturally-occurring or non-natural chemical compounds; rational design of a compound 
based on cliemical and/or structural properties of the target polynucleotide; and selection fi'om a 
library of chemical compounds created combinatorially or randomly. A sample comprising a 
polynucleotide encoding KPP is exposed to at least one test compound thus obtained. The sample 
may comprise, for example, an intact or penneabilized cell, or an in vitro cell-fi-ee or reconstituted 
biochemical systenL Alterations in the expression of a polynucleotide encoding KPP are assayed by 
any method commonly known ia the art. Typically, the expressio^i of a specific nucleotide is 
detected by hybridization with a probe having a nucleotide sequence complementary to the sequence 
of the polynucleotide encoding KPP. The amount of hybridization may be quantified, thus forming 
the basis for a comparison of the expression of the polynucleotide both with and without exposure to 
one or more test compounds. Detection of a change in the expression of a polynucleotide exposed to 
a test compound indicates that the test compound is effective in altering the expression of the 
polynucleotide. A screen for a compound effective in altering expression of a specific polynucleotide 
can be carried out, for example, using a Schizosaccltaromyces pombe gene expression system 



73 



wo 2004/098539 PCT/US2004/009215 

(Atkins, D. et al. (1999) U.S. Patent No. 5,932,435; Amdt, G.M. et al. (2000) Nucleic Acids Res. 
28:E15) or a human cell line such as HeLa cell (Qarke, M,L. et al. (2000) Biochem. Biophys. Res. 
Common. 268:8-13). A particular emhodiment of the present invention involves screening a 
combinatonal library of oligonucleotides (such as deoxyribonucleolides, ribonucleotides, peptide 
5 nucleic acids, and modified oligonucleotides) for antisense activity against a specific polynucleotide 
sequence (Bruice, T.W. et al. (1997) U.S. Patent No. 5,686,242; Bruice, T.W. et al. (2000) U.S. 
Patent No. 6,022,691). 

Many methods for introducing vectors into cells or tissues are available and equally suitable 
for use in vivo, in vitro , and ex vivo. For ex vivo therapy, vectors may be introduced into stem cells 
10 taken firom the patient and clonally propagated for autologous transplant back iuto that same patient. 
Delivery by transfection, by liposome injections, or by polycationic amino polyn^rs maybe achieved 
using methods which are well known in the art (Goldman, C.K. et al. (1997) Nat. Biotechnol. 15:462- 
'466). 

Any of the th^apeatic methods described above may be applied to any subject in need of 
15 such therapy, including, for exairtple, mammals such as humans, dogs, cats, cows, horses, rabbits, and 
monkeys. 

An additional erribodiment of the invention relates to the administration of a con5)osition 
which generally coir5>rises an active ingredient formulated with a pharmaceuticaUy acceptable 
excipient. Excipients may include, for exan^le, sugars, starches, celluloses, gums, and proteins. 

20 Various formulations are commonly known and are thoroughly discussed in the latest edition of 
Remington's Pharmaceutical Sciences (Maack PubHshing, Easton PA). Such con5>ositions may 
consist of KPP, antibodies to KPP, and mimetics, agonists, antagonists, or inhibitors of KPP. 

In various erribodiments, the compositions described herein, such as pharmaceutical 
compositions, may be administered by any number of routes including, but not limited to, oral, 

25 intravenous, intramuscular, intra-arterial, intramedullary, intrathecal, intraventricular, pulmonary, 
transdermal, subcutaneous, intraperitoneal, intranasal, enteral, topical, sublingual, or rectal means. 

Cornpositions for pulmonary administration may be prepared in liquid or dry powder form. 
These compositions are generally aerosolized immediately prior to inhalation by the patient. In the 
case of small molecules (e.g. traditional low molecular weight organic drugs), aerosol delivery of 

30 fast-acting formulations is well-known in the art. In the case of macromolecules (e.g. larger peptides 
and proteins), recent developments in the field of pulmonary delivery via the alveolar region of the 
lung have enabled the practical deUvery of dmgs such as insulin to blood circulation (see, e.g., Patton, 
J.S. et al., U.S. Patent No. 5,997,848). Pulmonary delivery allows administration without needle 
injection, and obviates the need for potentially toxic penetration enhancers. 

35 Compositions suitable for use in the invention include conpositions wherein the active 



74 



wo 2004/098539 



PCT/US2004/009215 



ingredients are contained in an effective amount to achieve the intended purpose. The determination 
of an effective dose is well within the capability of those skilled in the art. 

Specialized fonos of couqjositions msy be prepared for direct intracellular delivery of 
macromolecules comprisiiig KPP or fragments ibereof . For exajxple, liposoioe preparations 
5 containing a cell-in^iermBable macromolecule macy promote cell fusion and intracellular delivery of 
the macromolecule. Alternatively, KPP or a fragment thereof may be joined to a short cationic N- 
terminal portion frpm the HIV Tat- 1 protdn. Fusion proteins thus generated have been found to 
transduce into the cells of all tissues, mcluding the brain, in a mouse model system (Schwarze, S.R. et 
al. (1999) Science 285:1569-1572). 

10 For any conqpound, the therapeutically effective dose can be estimated initially either in cell 

culture assays, e.g., of neoplastic cells, or in animal models such as mice, rats, rabbits, dogs, 
mookeys, or pigs. An animal model may also be used to determine the appropriate concentration 
range and route of adnmnistration. Such information can then be used to detemnne useful doses and 
routes for adnunistration in humans. 

15 A therapeutically effective dose refers to that amount of active ingredient, for exanq>le KPP 

or fragments thereof, antibodies of KPP, and agonists, antagonists or inhibitors of KPP, which 
ameliorates the syiiq>toms or condition. Therapeutic efficacqr and toxicity may be detenruned by 
standard phanmceutical procedures in cell cultures or with experimental animals, such as by 
calculating the ED50 (the dose therapeutically effective in 50% of the population) or LD50 (the dose 

20 lethal to 50% of the population) statistics. The dose ratio of toxic to therapeutic eflFects is the 

therapeutic index, which can be expressed as the LD50/ED50 ratio. Compositions \;^ch exhibit large 
therapeutic indices are preferred. The data obtained from cell culture assays and anmial studies are 
used to formulate a range of dosage for human use. The dosage contained in such compositions is 
preferably within a range of circulating concentrations that includes the EDso with little or no toxicity. 

25 The dosage varies within this range depending upon the dosage form en^loyed, the sensitivity of the 
patient, and the route of admmistration. 

The exact dosage will be determined by the practitioner, in light of factors related to the 
subject requiring treatment Dosage and adnainistration are adjusted to provide sufficient levels of tbe 
active moiety or to maintain the desired effect Factors which may be taken into account include the 

30 severity of the disease state, the general health of the subject, the age, weight, and gender of the 
subject, time and frequency of adnmnistration, drug combination(s), reaction sensitivities, and 
response to therapy. Long-acting compositions may be administered every 3 to 4 days, every week, 
or biweekly depending on the half-life and clearance rate of the particular formulation. 

Normal dosage amounts may vary from about 0.1 //g to 100,000 /zg, up to a total dose of 

35 about 1 gram, depending upon the route of administration. Guidance as to particular dosages and 

r 

75 



wo 2004/098539 



PCT/US2004/009215 



methods of ddivery is provided in the literature and generally available to practitioners in the art 
Those skilled in the art will employ different formulations for nucleotides than for protons or thdr 
inhibitors. Sinmlarly, delivery of polynucleotides or polypeptides will be specific to particular cells, 
conditions, locations, etc. 
5 DIAGNOSTICS 

In another embodiment, antibodies which specifically bind KPP may be used for the 
diagnosis of disorders cbaracterized by expression of KPP, or in assays to monitor patients bdng 
treated with KPP or agonists, antagonists, or inhibitors of KPP. Antibodies useful for diagnostic 
purposes may be prepared in the same manner as described above for therapeutics. Diagnostic assays 

10 for KPP include methods whicli utilize the antibody and a label to detect KPP in human body fluids 
or in extracts of cells or tissues. The antibodies may be used with or without modification, and may 
be labeled by covalent or non-covalent attachment of a reporter molecule. A wide variety of r^orter 
molecules, several of which are described above, are known in the art and may be used. 

A varied of protocols for measuring KPP, including EUSAs, RIAs, and FACS, are koovm in 

15 the art and provide a basis for diagnosing altered or abnormal levels of KPP expression. Normal or 
standard values for KPP expression are established by conibining body fluids or cell extracts taken 
fi:om normal mammalian subjects, for exanq>le, human subjects, with antibodies to KPP under 
conditions suitable for coniplex formation. The amount of standard complex formation may be 
quantitated by various methods, such as photometric means. Quantities of KPP expressed in subject, 

20 control, and disease sauries from biopsied tissues are compared with the standard values. Deviation 
between standard and subject values establishes the parameters for diagnosing disease. 

In another embodimeut of the invention, polyuucleotides encoding KPP may be used for 
diagnostic purposes. The polynucleotides which may be used include oligonucleotides, 
complementary RNA and DNA molecules, and PNAs. The polynucleotides may be used to detect 

25 and quantify gene expression in biopsied tissues in which expression of KPP may be correlated with 
disease. The diagnostic assay may be used to determine absence, presence, and excess expression of 
KPP, and to monitor regulation of KPP levels during therapeutic intervention. 

In one aspect, hybridization with PGR probes which are capable of detecting polynucleotides, 
including genomic sequences, encoding KPF or closely related molecules maybe used to identify 

30 nucleic acid sequences which encode KPP. The specificity of the probe, whether it is made from a 
highly specific region, e.g., the 5' regulatory region, or from a less specific region, e.g., a conserved 
motif, and the stringency of the hybridization or amplification will determine whether the probe 
identifies only naturally occurring sequences encoding KPP, allelic variants, or related sequences. 

Probes may also be used for the detection of related sequences, and may have at least 50% 

35 sequence identity to any of the KPP encoding sequences. The hybridization probes of the subject 

76 



wo 2004/098539 PCT/US2004/009215 

invention may be DNA or RNA and may be derived from the sequence of SEQ ID NO:44-86 or from 
genomic sequences including promoters, enhancers, and introns of the KPP gene. 

Means for producing specific hybridization probes for polynucleotides encoding KPP include 
the cloning of polynucleotides encoding KPP or KPP derivatives into vectors for the productioii of 
5 mRNA probes. Such vectors are known in the art, are commercially available, and may be used to 
synthesize RNA probes in vitro by means of the addition of the appropriate RNA polymerases and the 
appropriate labeled nucleotides. Hybridization probes may be labeled by a variety of reporter groups, 
for exanqple, by radionuclides such as or ^^S, or by enzymatic labels, such as alkaline phosphatase 
coupled to the probe via avidin/biotin coupling systems, and the like. 

10 Polynucleotides encoding KPP may be used for the diagnosis of disorders associated with 

expression of KPP. Examples of such disorders include, but are not limited to, a cardiovascular 
disease such as arteriovenous fistula, atherosclerosis, hypertension, vasculitis, Raynaud's disease, 
aneurysms, art^al dissections, varicose vdns, thrombophlebitis and pUebothrombosis, vascular 
tumors, and conq)lications of throndsolysis, balloon angioplasty, vascular replacmient, and coronary 

15 artery bypass graft surgery, congestive heart failure, ischencdc heart disease, angina pectoris, 

nQTOcardial infarction, hypertensive heart disease, degenerative valvular heart disease, calcific aortic 
valve stenosis, congenitally bicuspid aortic valve, mitral annular calcification, mitral valve prolapse, 
rheumatic fever and rheumatic heart disease, infective endocarditis, nonbacterial throiribotic 
endocarditis, endocarditis of systemic lupus erytheroatosus, carcinoid heart disease, cardiorryopathy, 

20 myocarditis, pericarditis, neoplastic heart disease, congenital heart disease, and complications of 
cardiac transplantation, congenital lung anomalies, atelectasis, pulmonary congestion and edema, 
pulmonary endbolism, pulmonary hemorrhage, pulmonary infarction, puhnonary hypertension, 
vascular sclerosis, obstructive pulmonary disease, restrictive pulmonary disease, chronic obstructive 
pulmonary disease, enphysema, chronic bronchitis, bronchial asthma, bronchiectasis, bacterial 

25 pneumonia, viral and n^rcoplasmal pneumonia, limg abscess, pulmonary tuberculosis, diffuse 
interstitial diseases, pneumoconioses, sarcoidosis, idiopathic pulmonary fibrosis, desquamative 
interstitial pneumonitis, hypersensitivity pneumonitis, pulmonary eosinophilia bronchiolitis 
obliterans-organizing pneimoonia, diffuse pulmonary hemoithage syndromes, Goodpasture's 
syndromes, idiopathic pulmonary hemosiderosis, pulmonary involvement in collagen- vascular 

30 disorders, pulmonary alveolar proteinosis, lung tumors, inflammatory and noninflammatory pleural 
effusions, pneumothorax, pleural tumors, drug-induced lung disease, radiation-induced lung disease, 
and compUcations of lung transplantation; an nmnune system disorder such as acquired 
immunodeficiency syndrome (AIDS), Addison's disease, adult respiratory distress syndrome, 
allergies, ankylosing spondylitis, amyloidosis, anenia, asthma, atherosclerosis, autoimmune 

35 hemolytic anemia, autoimmune thyroiditis, autoimmune polyendocrinopathy-candidiasis-ectodermal 



77 



wo 2004/098539 



PCT/US2004/009215 



dystrophy (APECED), bronchitis, cholecystitis, contact dermatitis, Crohn's disease, atopic denmatitis, 
dennatonoyositis, diabetes meDitus, enophys^na, qpisodic lymphopenia wifhlymphoq^toxins, 
erythroblastosis fetalis, erythema nodosum, atrophic gastritis, glomerulonephritis, Goodpasture's 
syndrome, gout, Graves' disease, Hashimoto's thyroiditis, hypereosinophilia, irritable bowel 
syndrome, multiple scl^sis, myasthenia gravis, myocardial or pericardial inflammation, 
osteoarthritis, osteoporosis, pancreatitis, polymyositis, psoriasis, Reiter's syndrome, rheumatoid 
arthritis, scleroderma, Sjogren's syndrome, systemic anaphylaxis, systemic lupus erythematosus, 
systemic sclerosis, thrombocytopenic purpura, ulcerative colitis, uveitis, Werner syndrome, 
complications of cancer, hemodialysis, and extracorporeal circulation, viral, bacterial, fungal, 
parasitic, protozoal, and helminthic infections, and trauma; a neurological disorder such as epilepsy, 
ischemic cerebrovascular disease, stroke, cerebral neoplasms, Alzheimer's disease. Pick's disease, 
Huntington's disease, dementia, Parkinson's disease and other extrapyranidal disorders, an^rotrophic 
lateral sclerosis and other motor neuron disorders, progressive neural muscular atrophy, retinitis 
pigmentosa, hereditary ataxias, noultiple sclerosis and other demyelinating diseases, bacterial and 
viral meningitis, brain abscess, subdural enopyema, epidural abscess, suppurative intracranial 
thrond^ophlebitis, myelitis and radiculitis, viral central nervous system disease, prion diseases 
including kuru, Creutzfeldt-Jakob disease, and Gerstmann-Strausslear-Scheanker syndrome, fetal 
familial insomnia, nutritional and metabolic diseases of the nervous system, neurofibromatosis, 
tuberous sclcarosis, cerebelloretinal hemangioblastomatosis, encephalotrigeminal syndrome, mental 
retardation and other developmental disorders of the central nervous system including Down 
syndrome, cerebral palsy, neuroskeletal disorders, autonomic n^ous system disorders, cranial nerve 
disorders, spinal cord diseases, moascular dystrophy and other neuromuscular disorders, peripheral 
nervous system disorders, dermatonoyositis and polyiryositis, iiiherited, metabolic, endocrine, and 
toxic myopathies, myasthenia gravis, periodic paralysis, mental disorders including mood, anxiety, 
and schizophrenic disorders, seasonal affective disorder (SAD), akathesia, amnesia, catatonia, 
diabetic neuropathy, tardive dyskinesia, dystonias, paranoid psychoses, postherpetic neuralgia, 
Tourette's disorder, progressive supranuclear palsy, corticobasal degeneration, and familial 
fitrntotemporal dementia; a disorder affecting growth and development such as actinic keratosis, 
arteriosclerosis, atherosclerosis, bursitis, cirrhosis, hepatitis, mixed connective tissue disease 
(MCTD), myelofibrosis, paroxysmal nocturnal hemoglobinuria, polycythemia vera, psoriasis, primary 
thrombocyfhemia, renal tubular acidosis, anemia, Cushing's syndrome, achondroplastic dwarfism, 
Duchenne and Becker muscular dystrophy, epilepsy, gonadal dysgenesis, WAGR syndrome (Wilms' 
tumor, aniridia, genitourinary abnormalities, and mental retardation), Smith-Magenis syndrome, 
niyrelodysplastic syndrome, hereditary mucoepithelial dysplasia, hereditary keratodermas, hereditary 
neuropathies such as Charcot-Marie-Tooth disease and neurofibromatosis, hypothyroidism. 



78 



wo 2004/098539 



PCT/US2004/009215 



hydrocephalus, sdzure disorders such as SyDdedham's chorea and cerebral palsy, spina bifida, 
anencephaly, craniorachiscMsis, congenital glaucoma, cataract, and sensorineural hearing loss; a lipid 
disorder such as fatty liver, cholestasis, primaiy biliary cirdiosis, carnitine deficiency, carnitine 
palmitoyltransferase ddficieocy, myoadenylate deanunase deficiency, hypertriglyceridmiia, lipid 
5 storage disorders such Fabry's disease, Gaucher's disease, Niemann-Pick's disease, metachromatic 
leukodystrophy, adrmoleukodystrophy, GM2 gangliosidosis, and ceroid hpofuscinosis, 
abetalipoproteineiiua, Tangier disease, hyperlipoproteinjemia, diabetes mellitus, lipodystrophy, 
lipomatoses, acute panniculitis, disseminated fat necn-osis, adiposis dolorosa, lipoid adrenal 
hyperplasia, minimal change disease, lipomas, atherosclerosis, hypercholesterolemia, 

10 hypercholesterolemia with hypertriglyceridemia, primary hypoalphalipoprotdnemia, hypothyroidism, 
renal disease, liver disease, lecitbimcholesterol acyltransferase deficiency, c^brotendinous 
xanthomatosis, sitosteroleima, hypocholesterolemia, Tay-Sachs disease, Sandhoffs disease, 
hyperlipidemia, hyperlipemia, Upid myopathies, and obesity; and a cell prolif^ative disorder such as 
actinic keratosis, art^osclerosis, atherosclerosis, bursitis, drrhosis, hepatitis, mixed connective 

15 tissue disease (MCTD), myelofibrosis, paroxysmal nocturnal hemoglobinuria, polycythemia vera, 
psoriasis, primary thronaiboc3^eima, and cancers including adenocarcinoma, leukerma, lyirphoma, 
melanoma, myeloma, sarcoma, teratocarcinonoa, and, ia particular, cancers of the adrenal gland, 
bladder, bone, bone marrow, brain, breast, cervix, colon, gall bladder, ganglia, gastrointestinal tract, 
heart, kidney, liver, lung, muscle, ovary, pancreas, parathyroid, penis, prostate, salivary glands, skin, 

20 spleen, testis, thymus, thyroid, uterus, leukemias such as multiple myeloma, and lymphomas such as 
Hodgldn's disease. Polynucleotides encoding KPP may be used in Southern or norths analysis, dot 
blot, or other membrane-based technologies; in PGR technologies; in dipstick, pin, and multiformat 
ELISA-like assays; and inmicroarrays utilizing fluids or tissues from patients to detect altered KPP 
^pressiorL Such qualitative or quantitative methods are well known in the art 

25 In a particular embodiment, polynucleotides encoding KPP may be used in assays that detect 

the presence of associated disorders, particularly those mentioned above. Polynucleotides 
conqplementary to sequences ^coding KPP naay be labeled by standard methods and added to a fluid 
or tissue sample from a patient under conditions suitable for the formation of hybridization 
comyplexes. After a suitable incubation period, the sample is washed and the signal is quantified and 

30 compared with a standard value. If the amount of signal iu the patient sample is significanfly altered 
in comparison to a control sarcple then the presence of altered levels of polynucleotides encoding 
KPP in the sample indicates the presence of the associated disorder. Such assays may also be used to 
evaluate the efficacy of a particular therapeutic treatment regimen ia animal studies, in clinical trials, 
or to monitor the treatment of an individual patient. 

35 In order to provide a basis for the diagnosis of a disorder associated with expression of KPP, 

79 



wo 2004/098539 



PCT/US2004/009215 



a normal or standard proffle for expression is established. This may be accomplislied by conibinmg 
body fluids or cell extracts taken from normal subjects, either animal or human, with a sequence, or a 
fragment thereof, encoding KPP, under conditions suitable for hybridization or aiiQ)Iification. 
Standard hybridization may be quantified by comparing the values obtained from normal subjects 
with values from an experiment in which a known amount of a substantially purified polynucleotide 
is used. Standard values obtained in this manner may be conqiared with values obtained from 
san5)les from patients who are symptomatic for a disorder. Deviation from standard values is used to 
establish the presence of a disorder. 

Once the presence of a disorder is established and a treatmrat protocol is initiated, 
hybridization assays may be repeated on a regular basis to detenrine if the level of expression in the 
patient begins to approximate that which is observed in the normal subject. The results obtained from 
successive assays may be used to show the efficacy of treatment over a period ranging from several 
days to months. 

With respect to cancer, the presence of an abnormal amount of transcript (either under- or 
overexpressed) in biopsied tissue from an individual may indicate a predisposition for the 
development of the disease, or may provide a means for detecting the disease prior to the appearance 
of actual clinical symptovas. A more definitive diagnosis of this type may allow health professionals 
to en?>loy preventative measures or aggressive treatment earlier, thereby preventing the development 
or further progression of the cancer. 

Additional diagnostic uses for oligonucleotides designed from the sequences encoding KPP 
may involve the use of PGR. These oligomers may be chemically synthesized, generated 
enzymatically, or produced in vitro. Oligomers will preferably contain a fragment of a polynucleotide 
encoding KPP, or a fragment of a polynucleotide con5)lementary to the polynucleotide encoding 
KPP, and will be employed under optimized conditions for identification of a specific gene or 
condition. Oligomers may also be en5)loyed under less stringent conditions for detection or 
quantification of closely related DNA or RNA sequences. 

In a particular aspect, oligonucleotide primers derived fix)m polynucleotides encoding KPP 
may be used to detect single nucleotide polymorphisms (SNPs). SNPs are substitutions, insertions 
and deletions that are a frequent cause of inherited or acquired genetic disease in humans. Methods 
of SNP detection mclude, but are not limited to, singje-stranded conformation polymorphism (SSCP) 
and fluorescent SSCP (fSSCP) methods. In SSCP, ohgonucleotide primers derived from 
polynucleotides encoding KPP are used to amplify DNA using the polymerase chain reaction (PGR). 
The DNA may be derived, for exanple, from diseased or normal tissue, biopsy sair^jles, bodily fluids, 
and the like. SNPs in the DNA cause differences in the secondary and tertiary structures of PGR 
products in single-stranded form, and these differences are detectable using gel electrophoresis in 



80 



wo 2004/098539 PCT/US2004/009215 

non-denaturing gels. In fSCCP, the oligonucleotide primers are fluorescenfly labeled, which allows 
detection of the an:5)liDaers in high-throughput equipment such as DNA sequencing machines. 
Additionally, sequence database analysis methods, termed in silico SNP (isSNP), are capable of 
identifying polymorphisms by comparing the sequence of individual overlapping DNA fragments 
which asseinble into a common consensus sequence. These computer-based methods filter out 
sequence variations due to laboratory preparation of DNA and sequencing errors using statistical 
models and automated analyses of DNA sequence chromatograms. In the alternative, SNPs maybe 
detected and characterized by mass spectrometry using, for exan5)le, the high throughput 
MASSARRAY system (Sequenom, Inc., San Diego CA). 

SNPs maybe used to study the genetic basis of human disease. For example, at least 16 
common SNPs have been associated with non-insulin-dependent diabetes mellitus. SNPs are also 
useful for examining differences in disease outcomes in monogenic disorders, such as cystic fibrosis, 
sickle cell anemia, or chronic granulomatous disease. For exanople, variants in the mannose-binding 
lectin, MBL2, have been shown to be ccurelated with deleterious pulmonary outcomes in cystic 
fibrosis. SNPs also have utility in pharmacogenomics, the identification of genetic variants that 
influence a patient's response to a drug, such as life-threatening toxicity. For exmsple, a variation in 
N-acetyl transferase is associated with a high incidence of peripheral neuropathy in response to the 
anti-tuberculosis drug isoniazid, while a variation in the core promoter of the ALOX5 gene results in 
diminished clinical response to treatment with an anti-asfhma drug that targets the 5-lipoxygenase 
pathway. Analysis of the distribution of SNPs in different populations is useful for mvestigating 
genetic drift, mutation, recombination, and selection, as well as for tracing the origins of populations 
and their migrations (Taylor, J.G. et al. (2001) Trends Mol. Med. 7:507-512; Kwok, P.-Y. and Z. Gu 
(1999) MoL Med. Today 5:538-543; Nowotny, P. et al. (2001) Curr. Opin. Neurobiol. 11:637-641). 

Methods \;^*ich may also be used to quantify the expression of KPP include radiolabeling or 
biotinylating nucleotides, coamplification of a control nucleic acid, and interpolating results from 
standard curves (Melby, P.C. et al. (1993) J. Imnninol. Methods 159:235-244; Duplaa, C. et al. (1993) 
Anal. Biochem. 212:229-236). The speed of quantitation of multiple san5)les maybe accelerated by 
running the assay in a high-throughput format where the oligom^- or polynucleotide of interest is 
presented in various dilutions and a spectrophotometric or colorinaetric response gives rapid 
quantitation. 

In further embodiments, oligonucleotides or longer fragments derived from any of the 
polynucleotides described herein may be used as elements on a microarray. The microarray can be 
used in transcript i maging techniques which monitor the relative expression levels of large nunibers 
of genes sinmltaneously as described below. The microarray may also be used to identify genetic 
variants, mutations, and polymorphisms. This information may be used to detomine gene function. 



81 



wo 2004/098539 



PCT/US2004/009215 



to understand the genetic basis of a disorder, to diagnose a disorder, to monitor 
progression/regression of disease as a ftinction of gene expression, and to develop and nmnitor the 
activities of therapeutic agaits in Hie treatment of disease. In particular, this infbimatioii may be used 
to develop a pharmacogenomic profile of a patient in order to select the most appropriate and 
5 effective treatment regimen for that patient For exan?)le, therapeutic agents which are highly 
effisctive and display the fewest side effiscts maybe selected for a patient based onhisAer 
pharmacog^omic profile. 

In another embodiment, KPP, fragmcaits of KPP, or antiT)odies specific for KPP may be used 
as elenaenls on a nicroarray. The nicroanray may be used to monitor or measure protdn-protdm 
10 interactions, drug-target intearactions, and gene expression profiles, as described above. 

A particular embodiment relates to the use of the potynucleotid^ of the present invention to 
generate a transcript image of a tissue or cell type. A transcript image represents the global pattern of 
gene expression by a particular tissue or cell type. Global gene expression patterns are analyzed by 
quantifying flie number of expressed genes and their rdative abundance under given conditions and at 
15 a given time (Salhama- et aL, "Conqiarative Gene Transcript Analysis," U.S . Patent No. 5,840,484; 
hereby expressly incorporated by reterence herein). Thus a toanscript ima^ may be generated by 
hybridizing the polynacleotides of the presait invention or their con^dements to flie totality of 
tianscripts or reverse transcripts of a particular tissue or cell type. In one anbodimfiaot, the 
hybridization tafces place in high-fhrou^put format, w^ierein the polynucleotides of the present 
invention or their conqflements comprise a subset of a plurality of elements on a microarray. The 
resultant transcript image would provide a profile of gene activity. 

Transcript images may be generated using transcripts isolated fi»m tissues, cell lines, 
biopsies, or oflier biological samples. The transcript image may thus reflect gaie expression in vivo. 
as in the case of a tissue or biopsy sample, or in vitro, as in the case of a cell line. 

Transcript images whidi profile the expression of the polynacleotides of flie present 
invention may also be used in conjunction with m vitro model systems and preclinical evaluation of 
pharmaceuticals, as wdl as toxicological testing of industrial and naturany-occurring environmental 
compounds. All conqjounds induce characteristic gene expression patterns, firequenfly termed 
molecular fingerprints or toxicant signatures, which are indicative of medianisms of action and 
toxicity CNuwaysir, E.F. et aL (1999) Mol. Carcinog. 24:153-159; Steiner. S. and N.L. Anderson 
(2000) Toxicol. Lett 112-113:467-471). If a test compound has a signature similar to that of a 
confound wifli known toxicity, it is likdy to diare those toxic properties. These fingerprints or 
signatures are most usefid and refined when they contain expression information fi-om a large nuaber 
of genes and gene families. Ideally, a gename-wide measurement of expression provides die highest 
quaHty signature. Even genes wflbiose expression is not altered by any tested coii5)ounds are important 



82 



wo 2004/098539 



PCT/US2004/009215 



as wen, as the levels of expression of these genes are used to normalize the rest of the expression 
data. The normalization procedure is usefid for comparison of expression data after treatment with 
different conq^ounds. While the assignment of gene function to elements of a toxicant signature aids 
in interpretation of toxicity naechanisms, knowledge of gene function is not necessary for the 
5 statistical matcliing of signatures which leads to prediction of toxicity (see, for exan5>le, Press 

Release 00-02 from the National Institute of Environmental Health Sciences, released February 29, 
2000, available at niehs.nih.gov/oc/news/toxchip.htm). Therefore, it is important and desirable in 
toxicological screening using toxicant signatures to include all expressed gem sequences. 

In an embodiment, the toxicity of a test comqponnd can be assessed by treating a biological 
10 san^le containing nucleic acids with the test compound. Nucleic acids that are expressed in the 
tceated biological smspl& are hybridized with one or more probes specific to the polynucleotides of 
the present invention, so that transcript levels corresponding to the polynucleotides of the present 
invention may be quantified. Tbe transcript levels in the treated biological sample are compared with 
levels in an untreated biological sample. Differences in the transcript levels between the two samples 
15 are indicative of a toxic response caused by the test compound in the treated san^le. 

Another eotoodiment relates to the use of the polypeptides disclosed herein to analyze the 
proteome of a tissue or cell type. The termproteome refers to the global pattern of protein expression 
in a particular tissue or cell type. Each protein component of a proteonoe can be subjected 
individually to further analysis. Proteome expression patterns, or profiles, are analyzed by 
20 quantifying the number of expressed protems and their relative abundance under given conditions and 
*at a given time. A profile of a cell's proteome may thus be generated by separating and analyzing the 
polypeptides of a particular tissue or cdl type. In one ensbodiment, the separation is achieved using 
two-dhnensional gel electrophoresis, in which protdns from a san[5)le are separated by isoelectric 
focusing in the first dimension, and then according to molecular weight by sodium dodecyl sulfate 
25 slab gel electrophoresis in the second dimension (Steiner and Aaderson, supra). The proteins are 
visualized in the gel as disorete and uniquely positioned spots, typically by stainmg the gel with an 
agent such as Coomassie Blue or silver or fluorescent stains. The optical density of each protein spot 
is generally proportional to the level of flie protein in the sample. The optical densities of 
equivalently positioned protein spots fix)m different samples, for example, from biological san5)les 
30 either treated or untreated with a test compound or therapeutic agent, are compared to identify any 
changes in protein spot density related to the treatment. The proteins in the spots are partially 
sequenced using, for example, standard methods employing chemical or enzymatic cleavage followed 
by mass spectrometry. The identity of the protein in a spot may be determined by comparing its 
partial sequence, preferably of at least 5 contiguous amino acid residues, to the polypeptide sequences 
35 of interest. In some cases, further sequence data may be obtained for definitive protein identification. 



83 



wo 2004/098539 



PCT/US2004/009215 



A proteomic profile may also be goierated using antibodies specific for KPP to quantify the 
levels of KPP expression. In one enibodinoent, the antibodies are used as elements on a noicroarray, 
and protein expression levels are quantified by contacting the ndcroarray with the sanoyple and 
detecting the levels of protein bound to each array element (Lucking, A. et al. (1999) Anal. Biochem. 
5 270:103-111; Mendoze, L.G. et aL (1999) Biotechniques 27:778-788). Detection may be performed 
by a variety of methods known in the art, for example^ by reacting the protdns in the san^le with a 
thiol- or amino-reactive fluorescent conopound and detecting the amount of fluorescence bound at 
each array element 

Toxicant signatures at the proteome level are also useful for toxicological screening, and 
10 should be analyzed in parallel with toxicant signatures at the transcript level. There is a poor 

correlation between transcript and protein abimdances for some proteins in some tissues (Anderson, 
N.L. and J. Seilhamer (1997) Hectrophoresis 18:533-537), so proteome toxicant signatures maybe 
useful in the analysis of conqpounds which do not significantly affect the transcript image, but which 
alter the proteonic profile. In addition, the analysis of transcripts in body fluids is difficult, due to 
15 rapid degradation of mRNA, so proteomic profiling may be more reliable and informative in such 
cases. 

In another embodiment, the toxicity of a test compound is assessed by treating a biological 
sample containing proteins with the test con^und. Proteins that are expressed in the treated 
biological sample are s^arated so that the amomit of each protein can be quantified. The amount of 

20 each protein is compared to the amount of the corresponding protein in an untreated biological 
sanq)le. A difference in the amount of protein between the two saiiq)les is indicative of a toxic 
response to the test compound in the treated sanople. Individual proteins are identified by sequencing 
the amino acid residues of the individual proteins and comparing these partial sequences to the 
polypeptides of the present invmtion. 

25 In anoth^ einbodiment, the toxicity of a test compound is assessed by treating a biological 

sanople containing proteins with the test conq)ound. Proteins from the biological san^le are 
incubated with antibodies specific to the polypeptides of the present invention. The amount of 
protein recognized by the antibodies is quantified. The amount of protein in the treated biological 
sample is compared with the amount in an untreated biological sample. A difference in the amount of 

30 protein between the two samples is indicative of a toxic response to the test confound in the treated 
sample. 

Microarrays may be prepared, used, and analyzed using methods known in the art (Brennan, 
T.M. et al. (1995) U.S. Patent No. 5,474,796; Schena, M. et al. (1996) Proc. Natl. Acad. Sci. USA 
93:10614-10619; Baldeschweiler et al. (1995) PCT apphcation W095/25116; Shalon, D. et al. (1995) 
35 PCT apphcation WO95/35505; Heller, R.A. et al. (1997) Proc. Natl. Acad. Sci, USA 94:2150-2155; 

84 



wo 2004/098539 



PCT/US2004/009215 



Heller, M.J. et al. (1997) U.S. Patent No. 5,605,662). Various types of nricroarrays are well known 
and thoroughly described in Schena, M,, ed. (1999; DNAMjcroarravs: A Practical Approach Oxford 
University Press, London). 

In another erdbodiment of the invention, nucleic acid sequences encoding KPP may be used 
to generate hybridization probes useful in mapping the naturally occurring genomic sequence. Either 
coding or noncoding sequences may be used, and in some instances, noncoding sequences may be 
preferable over coding sequences. For exan5)le, conservation of a coding sequrace among menibers 
of a multi-gene family may potentially cause undesired cross hybridization during chromosomal 
mapping. The sequences may be mapped to a particular chromosome, to a specific region of a 
chromosome, or to artificial chromosonoe constructions, e.g., hxunan artificial chromosomes (HACs), 
yeast artificial chromosomes (YACs), bacterial artificial chromosomes (BACs), bacterial PI 
constructions, or sixigle chromosonoe cDNA libraries (Harrington, J. J. et al. (1997) Nat Genet. 
15:345-355; Price, CM. (1993) Blood Rev. 7:127-134; Trask, B.J. (1991) Trends Genet 7:149-154). 
Once mapped, the nucleic acid sequences may be used to develop genetic linkage maps, for example, 
which correlate the inheritance of a disease state with the inheritance of a particular chromosome 
region or restriction fragment length polymorphism (RFLP) (Lander, E.S. and D. Botstein (1986) 
Proc. Natl. Acad. Sci. USA 83:7353-7357). 

Fhiorescent in situ hybridization (FISH) may be correlated with other physical and genetic 
map data (Heinz-Ukich, et aL (1995) in Meyers, supra, pp. 965-968). Examples of genetic map data 
can be found in various scientific journals or at the Online Mendelian Inheritance in Man (OMIM) 
World Wide Web site. Correlation between the location of the gene encodmg KPF on a physical map 
and a spedfic disorder, or a predisposition to a specific disorder, may help defibae the region of DNA 
associated with that disorder and thus may further positional cloning efforts. 

In situ hybridization of chromosomal preparations and physical mapping techniques, such as 
linkage analysis using established chromosomal markers, may be used for extending genetic maps. 
Oftm the placement of a gene on the chromosome of another mammalian species, such as mouse, 
may reveal associated markers even if the exact chromosomal locus is not known. This information 
is valuable to investigators searching for disease genes using positional cloning or other gene 
discovery techniques. Once the gene or genes responsible for a disease or syndrome have been 
crudely localized by genetic linkage to a particular genomic region, e.g., ataxia-telangiectasia to 
llq22-23, any sequences mapping to that area may represent associated or regulatory genes for 
further investigation (Gatti, R.A et al. (1988) Nature 336:577-580). The nucleotide sequence of the 
instant invention may also be used to detect differences in the chromosomal location due to 
translocation, inversion, etc., among normal, carrier, or affected individuals. 

In another enabodiment of the invention, KPP, its catalytic or immunogenic fragments, or 



85 



wo 2004/098539 



PCT/US2004/009215 



10 



oUgopeptides thereof can be used for sdeeoing libraries of compounds in aiy of a variety of drug 
screening techniques. The fragment employed in such sc««ring may be free in soluti^ 
soM support, borne on a cell surface, or located intiacdMarly. The fonnation of binding coniplexes 
between KPP and the agent beang tested may be measured. 

Another technique for drug screwing provides for high throughput screening of compounds 
having suitable binding affinity to the protein of interest (Geysen. et al. (1984) PCT appKcation 
WO84/03564). In this mefliod, large numbers of different small test compounds are synfliesized on a 
soUd substrate. "Hie test coinpounds are reacted with KPP, or fragments thereof, and washed. Bound 
KPPisthendetectedbymettiodswenfcDowninflieart. Purified KPP can also be coated directly onto 
plates for use in the aforementioned drug screening techniques. Alternative^, noi^-neutralizing 
antibodies can be used to capture the peptide and immobilize it on a solid support 

In another embodiment, one may use competitive drug screening assays in ^ch neutralizing 
antibodies capable of binding KPP specifically compete wifli a test compound fi>r binding KPP. In 
this manner, antibodies can be used to detect the presence of any peptide which shares one or more 
15 antigenic detecEcdnants with KPP. ' 

In additional embodiments, the nucleotide sequences which encode KPP may be used in any 

molecular biology techniques that have yet to be devdoped. provided the new techniques rely on 
properties of nucleotide sequences that are currently known, including, but not limited to, such 
properties as the tiq)let genetic code and specific base pair intwactions. 

Without forflier elaboration, it is beUeved that one skilled in the aft can. using the preceding 
description, utilize flie present invention to its fiiUest extent. The following embodiments are, 
therefore, to be constmed as merely illustrative, and not limitative of the remainder of the disclosure 
in any way whatsoever. 

The disclosures of all patents. appHcations, and pubUcations mentioned above and below, 
including U.S. Ser. No. 60/467.491. U.S. Ser. No. 60/469,441. U.S. Ser. No. 60/476,408. U.S. Ser. 
No. 60/494.656, U.S. Ser. No. 60/524.415. and U.S. Ser. No. 60/528.750. are hereby expressly 
incorporated by reference. 



20 



IS 



EXAMPLES 

I. Construction of cDNA Libraries 



mcyte cDNAs are derived from cDNA Ubraries described in the LIFESEQ database (Incyte. 
Palo Alto CA). Some tissues are homogenized and lysed in guanidiniumisothiocyanate. while others 
are homogenized and lysed in phenol or in a suitable mixture of denaturants. such as TRIZOL 
dnvitrogen). a monophasic solution of phenol and guanidine isothiocyanate. The resulting lysates 
centrifiiged over CsCl cushions or extracted with chloroform RNAis precipitated from the lysates 



are 



86 



wo 2004/098539 



PCTAJS2004/009215 



with either isopiopanol or sodium acetate and ethanol, or by other routme noethods. 

Phenol extraction and precipitation of RNA are repeated as necessary to increase RNA purity. 
In some cases, RNA is treated with DNase. For most libraries, poly(A)+ RNA is isolated using oligo 
d(T)-coupled paramagnetic particles (Promega), OLIGOTEX latex particles (QIAGEN, Chatsworth 
5 CA), or an OLIGOTEX noRNA purification kit (QIAGEN). Alternatively, RNA is isolated directly 
from tissue l^ates using other RNA isolation kits, e.g., the POLY(A)PURE mRNA purification kit 
(Anibion, Austin TX). 

In some cases, Stratagene is provided with RNA and constructs the correspondiog cDNA 
libraries. Otherwise, cDNA is synthesized and cDNA libraries are constructed with the UNIZAP 

10 vector system (Stratagene) or SUPERSCRIPT plasmid system (Invitrogen), using the recommended 
procedures or similar methods known in the art (Ausubel et al., supra, ch. 5). Rev^e transcription is 
initiated using oligo dCO or random primers. Synthetic oligonucleotide adapters are ligated to double 
stranded cDNA, and the cDNA is digested with the appropriate restriction enzyme or enzymes. For 
most libraries, the cDNA is size-selected (300-1000 bp) using SEPHACRYL SIOOO, SEPHAROSE 

15 CL2B, or SEPHAROSE CL4B column chromatography (Amersham Biosciences) or preparative 
agarose gel electrophoresis. cDNAs are ligated into counpatible restriction enzyme sites of the 
polylinker of a suitable plasnmd, e.g., PBLUESCRIPT plasmid (Stratagene), PSPORTl plasmid 
Otavitrogen, Carlsbad CA), PCDNA2.1 plasnnd (Invitrogen), PBK-CMV plasnnd (Stratagene), PCR2- 
TOPOTA plasmid (Invitrogen), PCMV-ICIS plasnfid (Stratagene), pIGEN (Incite, Palo Alto CA), 

20 pRARE (tncyte), or pINCY (Incyte), or derivatives th^^of. Recombinant plasmids are transfomxsd 
into conq)etent E, coli cells including XLl-Bhie, XLl-BhieMRF, or SOLR firom Stratagene or DH5a, 
DHIOB, or ElectroMAX DHIOB firom Invitrogen. 
n* Isolation of cDNA Clones 

Plasncdds obtained as described in Example I are recovered fromhost cells by in vivo excision 

25 usingtheUNIZAP vector system (Stratagene) or by cell lysis. Hasmids are purified using at least 
one of the following: a Magic or WIZARD Minipreps DNA purification system (Promega); an AGTC 
Miniprep purification kit (Edge Biosystems, Gaithersburg MD); and QIAWELL 8 Plascoid, 
QIAWELL 8 Plus Plasmid, QIAWELL 8 Ultra Plasmid purification systems or the R.B. A.L. PREP 96 
plasmid purification Mt from QIAGEN. Following precipitation, plasnoids are resuspended in 0.1 ml 

30 of distilled wat^ and stored, with or without lyophilization, at 4'*C. 

Alternatively, plasmid DNA is anqplified firomhost cell lysates using direct link PGR in a 
higji-throughput format (Rao, V.B. (1994) Anal. Biochem. 216:1-14). Host cell lysis and thermal 
cycling steps are carried out in a single reaction mixture. Samples are processed and stored in 384- 
well plates, and the concentration of amplified plasnnd DNA is quantified fluorometricaUy using 

35 PICOGREEN dye (Molecular Probes, Eugene OR) and a FLUOROSKAN n fluorescence scanner 

87 



wo 2004/098539 



PCT/US2004/009215 



(Labsystems Oy, Helsiidd, Finland), 
m. Sequencing and Analysis 

Incyte cDNA recovered in plasmids as described in Example n are sequenced as follows. 
Sequencing reactions are processed using standard mefliods or high-thioughput instrumentation such 
5 as flie ABI CATALYST 800 (Applied Biosystems) lliermal cycler or the PTC-200 ftennal cycler (MJ 
Research) in conjunction with the HYDRA microdispenser (Robbins SdentiJBc) or the MICROLAB 
2200 (Hamilton) liquid transfer system cDNA sequencing reactions are prepared using reagents 
provided by Amersham Biosciences or supplied in ABI sequendng kits such as the ABI PRISM 
BIGDYE Temrinator cycle sequencing ready reaction kit (Applied Biosystems). Electrophoretic 

10 separation of cDNA sequencing reactions and detection of labeled polynucleotides are carried out 

using the MEGABACE 1000 DNA sequencmg system (Amersham Biosciences); the ABI PRISM 373 
or 377 sequencing system (Apphed Biosystems) in conjunction with standard ABI protocols and base 
calling software; or other sequence analysis systems known in the art. Reading frames within the 
cDNA sequences are identified using standard methods (Ausubel et al., supra, ch. 7). Some of the 

15 cDNA sequences are selected for extension using the techniques disclosed in Bxan^le Vin. 

Polynucleotide sequences derived from Incyte cDNAs are validated by removing vector, 
liiiker, and poly(A) sequences and by masking ambiguous bases, using algorithms and programs 
based on BLAST, dynamic programmmg, and dioucleotide nearest neighbor analysis. The Incyte 
cDNA sequences or translations thereof are then queried against a selection of public databases such 

20 as the GenBank primate, rodent, mammalian, vertebrate, and eukaryote databases, and BLOCKS, 
PRINTS, DOMO, PRODOM; PROTEOME databases with sequences froiaHomo sapiens, Rattus 
noTvegicus, Mus musculus, Caenorhabditis elegans, Saccharowyces cerevisiae, Schizosaccharoinyces 
pombe, and Candida albicans (Incyte, Palo ALto CA); hidden Markov model (HMM)-based protein 
family databases such as PFAM, INCY, and TIGRFAM (Haft, D.H. et al. (2001) Nucleic Acids Res. 

25 29:41-43); and HMM-based protein domain databases such as SMART (Schultz, J. et al. (1998) Proc. 
Nafl. Acad. Sci. USA 95:5857-5864; Letunic, I. et al. (2002) Nucleic Acids Res. 30:242-244). 
(HMM is a probabilistic approach which analyzes consensus primary structures of gene families; see, 
for example, Eddy, S.R. (1996) Curr. Opin. Struct. Biol. 6:361-365.) The queries are performed 
using programs based on BLAST, FASTA, BLIMPS, and HMMER. The Incyte cDNA sequences are 

30 assenibled to produce full length polynucleotide sequences. Alternatively, GenBank cDNAs, 

GenBank ESTs, stitched sequences, stretched sequences, or Genscan-predicted coding sequences (see 
Exaroples IV and V) are used to extend Incyte cDNA asserriblages to fuU length. Assembly is 
performed using programs based on Phred, Phrap, and Consed, and cDNA assemblages are screened 
for open reading frames using programs based on GeneMark, BLAST, and FASTA. The full length 

35 polynucleotide sequences are translated to derive the corresponding full length polypeptide 

88 



wo 2004/098539 



PCTAJS2004/009215 



sequences. Alternatively, a polypeptide may begin at any of the methionine residues of the full length 
translated polypeptide. Full length polypeptide sequences are subsequently analyzed by querying 
against databases such as the GenBank protein databases (genpept), SwissProt, the PROTEOME 
databases, BLCKIKS, PRINTS, DOMO, PRODOM, Prosite, hidden Markov model (HMM)-based 
protein farraly databases such as PFAM, INCY, and TIGRFAM; and HMM-based protem domam 
databases such as SMART. Full length polynucleotide sequences are also analyzed using 
MACDNASIS PRO software (MiraiBio, Alameda CA) and LASERGENE software O^NASTAR). 
Polynucleotide and polypeptide sequence alignments are generated using default parameters specified 
by the CLUSTAL algorithm as incorporated into the MEGALIGN raultisequence alignment program 
(DNASTAR), which also calculates the percent identity between aligned sequ^ices. 

Table 7 sumnmizes tools, prograns, and algorithms used for the analysis and assenobly of 
Incyte cDNA and fiill length sequences and provides applicable descriptions, references, and 
threshold parameters. The first column of Table 7 shows the tools, programs, and algorithms nsed, 
the second column provides brief descriptions thereof, the third column presents appropriate 
references, all of which are incorporated by reference herein in their entirety, and the fourth cohmm 
presents, where applicable, the scores, probability values, and other parameters used to evaluate the 
strength of a matdi between two sequences (the higher the score or the lower the probability value, 
the greater the identity between two sequences). 

The prograns described above for the assenibly and analysis of full length polynucleotide 
and polypeptide sequences are also used to identify polynucleotide sequence firagments from SEQ ID 
NO:44-86. FragmMts firom about 20 to about 4000 nucleotides which are useful in hybridization and 
amplification technologies are described in Table 4, column 2. 
IV. IdenUfication and Editii^ of Coding Sequences from Genomic DNA 

Putative kinases and phosphatases are initially identified by running the Genscan gene 
idCTtification program against public genomic sequence databases (e.g., gbpri and gbhtg). Genscan is 
a gmeral-purpose gme identification program which analyzes genomic DNA sequences firom a 
variety of organisms (Burge, C. and S. Karlin (1997) J. Mol. Biol. 268:78-94; Burge, C. and S. Karlin 
(1998) Curr. Opin Struct Biol. 8:346-354). The program concatenates predicted exons to form an 
assenibled cDNA sequence extending firom a methionine to a stop codon. The output of Genscan is a 
FASTA database of polynucleotide and polypeptide sequences. The maximum range of sequence for 
Genscan to analyze at once is set to 30 kb. To determine which of fliese Genscan predicted cDNA 
sequences encode kinases and phosphatases, the encoded polypeptides are ansdyzedby querying 
against PFAM models for kinases and phosphatases. Potential kinases and phosphatases are also 
identified by homology to Incyte cDNA sequences that have been annotated as kinases and 
phosphatases. These selected Genscan-predicted sequences are then con^ared by BLAST analysis to 



89 



wo 2004/098539 PCT/US2004/009215 

the genpept and gbpri public databases. Where necessary, the Genscan-predicted sequences aie then 
edited by comparison to the top BLAST hit from genpept to correct errors in the sequence predicted 
by Genscan, such as extra or omitted exons. BLAST analysis is also used to find any Incyte cDNA oj 
public cDNA coverage of the Genscan-predicted sequences, thus providing evidence for transcription. 
When Incyte cDNA coverage is available, this information is used to correct or confinnflie Genscan 
predicted sequence. Full length polynucleotide sequences are obtained by assembling Gemscan- 
predicted coding sequences with Incyte cDNA sequences and/or public cDNA sequences using the 
assenibly process described in Exan5)le m. Alternatively, full length polynucleotide sequences are 
derived entirely from edited or unedited Genscan-predicted coding sequences. 
V. Assembly of Genomic Sequence Data with cDNA Sequence Data 
"Stitched" Sequences 

Partial cDNA sequences are extended with exons predicted by the Genscan gene 
identification program described m Example IV. Partial cDNAs assemibled as described m Exan^fle 
m are mapped to genonuc DNA and parsed into clusters containing related cDNAs and Genscan exon 
predictions from one or more genonuc sequences. Each cluster is analyzed using an algorithmbased 
on graph theory and dynanic programnring to integrate cDNA and genomic information, generating 
possible splice variants that are subsequenfly confirmed, edited, or extended to create a full length 
sequence. Sequence intervals in which the entire length of the interval is present on more than one 
sequence in the cluster are identified, and intervals thus identified are considered to be equivalent by 
transitivity. For example, if an interval is present on a cDNA and two genonic sequences, then all 
three intervals arc considered to be equivalent. This process allows unrelated but consecutive 
genomic sequences to be brought together, bridged by cDNA sequence. Intervals thus identified are 
fbsai "stitched*' together by the stitching algorithm in the order that they appear along their parent 
sequences to generate the longest possible sequence, as well as sequence variants. Linkages between 
mtervals which proceed along one type of parent sequence (cDNA to cDNA or genomic sequence to 
geoonic sequence) are given preference over linkages vAAch change parent type (cDNA to genomic 
sequence). The resultant stitched sequences are translated and compared by BLAST analysis to the 
genpept and gbpri public databases. Incorrect exons predicted by Genscan are corrected by 
comparison to the top BLAST hit from genpept. Sequences are farther extended with additional 
cDNA sequences, or by inspection of genomic DNA, when necessary. 
^^Stretched^^ Sequences 

Partial DNA sequences are extended to full length with an algorithmbased on BLAST 
analysis. First, partial cDNAs assembled as described in Bxaxx^lo m are queried against public 
databases such as the GenBank primate, rodent, mammalian, vertebrate, and eukaryote databases 
using the BLAST program. The nearest GenBank protein homolog is then compared by BLAST 



90 



wo 2004/098539 



PCT/US2004/009215 



analysis to either Incyte cDNA sequences or GenScan exon predicted sequences described in 
Exan^le IV. A chimeric protein is generated by using the resultant high-scoring segment pairs 
(HSPs) to map the translated sequences onto the GenBank protein homolog. Ins^ons or deletions 
may occur in the cbimeric protein with respect to the origmal GenBank protein homolog. The 
5 GenBank protein homolog, the chimeric protein, or both are used as probes to search for homologous 
genonuc sequences from the public human genonoe databases. Partial DNA sequences are therefDre 
"stretched" or extended hy the addition of homologous genonuc sequences. The resultant stretched 
sequCTces are exancuned to detemune whether they contain a complete gene. 

VI. Chromosomal Mapping of KPP Encoding Polynudeotides 

10 The sequences used to assenoible SEQ ID NO:44-86 are compared with sequences firom the 

Incyte UFBSEQ database and public domain databases using BLAST and other implementations of 
the Smith-Waterman algorithm. Sequences &om fliese databases that matched SEQ ID NO:44-86 are 
assembled into clusters of contiguous and overlapping sequences using assembly algorithms such as 
Phrap (Table 7). Radiation hybrid and genetic mapping data available from public resources such as 

15 the Stanford Human Genome Center (SHGC), Whitehead Institute for Genome Research (WIGR), 
and G6nethon are used to determine if any of the clustered sequences have been previously mapped. 
Inclusion of a mapped sequence in a cluster results in the assignment of all sequences of that cluster, 
including its particular SEQ ID NO:, to fliat map location. 

Map locations are represented by ranges, or intervals, of human chroioosomes. Ihe map 

20 position of an interval, in centiMorgans, is measured relative to flie terminus of the chromosome's p- 
arm. (The centiMorgan (cM) is a tmit of measurement based on recombination frequencies between 
chromosomal markers. On average, 1 cM is roughly equivalent to 1 megabase (Mb) of DNA in 
humans, although this can vary widely due to hot and cold spots of reconabination.) The cM 
distances are based on genetic markers mapped by Genethon which provide boiindaries for radiation 

25 hybrid markers whose sequences were included in each of the clusters. Human genome maps and 
other resources available to the public, such as the NCBI "GeneMap'99" World Wide Web site 
(ncbi.nlnLnih.gov/genemap/), can be employed to determine if previously identij&ed disease genes 
map within or in proximity to the intervals indicated above. 

VII. Analysis of Polynucleotide E^spression 

30 Northern analysis is a laboratory technique used to detect the presence of a transcript of a 

gene and involves the hybridization of a labeled nucleotide sequence to a meinbrane on which RNAs 
from a particular cell type or tissue have been bound (Samibrook and RusseU, supra, ch. 7; Ausubel et 
al., supra, ch. 4). 

Analogous computer techniques applying BLAST are used to search for identical or related 
35 molecules in databases such as GenBank or LIFESEQ (Incyte). This analysis is much faster than 

91 



wo 2004/098539 



PCT/US2004/009215 



multiple membrane-based hybridizations. In addition, the sensitivity of the computer search can be 
modified to determine whether any particular match is categorized as exact or similar. The basis of 
the search is the product score, which is defined as: 

5 BLAST Score x Percent Identity 

5 X minimum {length(Seq. 1), lengfh(Seq. 2)} 

The product score takes into accoimt both the degree of sinularily between two sequences and the 
length of the sequence match. The product score is a normalized value between 0 and 100, and is 

10 calculated as follows: the BLAST score is multiplied by the percent nucleotide identity and the 
product is divided by (5 times the length of the shorter of the two sequences). The BLAST score is 
calculated by assigning a score of +5 for every base that matches in a high-scoiing segment pair 
(HSP), and -4 for every mismatch. Two sequences may share more than one HSP (separated by 
gaps). If there is more than one HSP, then the pair with the highest BLAST score is used to calculate 

15 the product score. The product score represents a balance between firactional overlap and quality in a 
BLAST alignment For example, a product score of 100 is produced only for 100% identity ovear the 
entire length of the shorter of the two sequences being conopared. A product score of 70 is produced 
dither by 100% identity and 70% overlap at one end, or by 88% identity and 100% overlap at the 
other. A product score of 50 is produced dither by 100% identity and 50% overlap at one end, or 79% 

20 identity and 100% overlap. 

Alternatively, polynucleotides encoding KPP are analyzed with respect to the tissue sources 
firom which they are derived. For example, some full length sequences are assCTobled, at least in part, 
with overlapping Incyte cDNA sequences (see Btample HI). Each cDNA sequence is derived firom a 
cDNA library constructed firom a human tissue. Each human tissue is classified into one of the 

25 following organ/tissue categories: cardiovascular system; connective tissue; digestive system; 
^Dobryonic structures; endocrine system; exocrine glands; genitalia, female; genitalia, male; geim 
cells; hemic and inamune system; liver; musculoskeletal system; nervous system; pancreas; 
respiratory system; sense organs; skin; stomatognathic system; unclassified/mixed; or urinary tract. 
The number of libraries in each category is counted and divided by the total number of libraries 

30 across all categories. Similarly, each human tissue is classified into one of the following 

disease/condition categories: cancer, cell line, developmental, inflamination, neurological, trauma, 
cardiovascular, pooled, and other, and the nundber of libraries in each category is counted and divided 
by the total number of libraries across all categories. The resulting percentages reflect the tissue- and 
disease-specific expression of cDNA encoding KPP. cDNA sequences and cDNA Ubrary/tissue 

35 information are found in the LIFESEQ database (Incyte, Palo Alto CA). 



92 



wo 2004/098539 



PCT/US2004/009215 



Vin* Extension of KPP Encoding Polynudeottdes 

Full length polynucleotides are produced by extension of an appropriate fragmBnt of the full 
length molecule using oligonucleotide pximers designed from this fragment One piimi^ is 
synthesized to initiate S' extension of the known fragment, and the other primer is synthesized to 
S initiate 3' extension of the known fragment The initial prhners are designed using OLIGO 4.06 
software (National Biosciences), or another appropriate program, to be about 22 to 30 nucleotides in 
length, to have a GC content of about 50% or more, and to anneal to the target sequence at 
tenoperatores of about 68°C to about 72*'C. Any stretch, of nucleotides which would result inhairpin 
structures and primer-primer dinoerizations is avoided. 

10 Selected human cDNA libraries are used to extend the sequence. If more than one extension 

is necessary or desired, additional or nested sets of primers are designed. 

High fidelity anoplification is obtained by PGR using methods wdl known in the art PGR is 
performed in 96-well plates using the PTC-200 thermal cycler CMJ Research, Inc.). The reaction mix 
contains DNA tenoplate, 200 nmol of each pruner, reaction buffer containing Mg^^ (NH4)2S04, and 2- 

15 noercaptoethanol, Taq DNA polymerase (Amersham Biosciences), ELONGASE enzyme (Invitrogen), 
and Pfu DNA polymerase (Stratagene), with the following parameters for primer pair PCI A and PGI 
B: Step 1: 94**C, 3 niin; Step 2: 94**C, 15 sec; Stsp 3: 60*C, 1 min; Step 4: 68 "'C, 2 min; Step 5: Steps 
2, 3, and 4 T&peatsd 20 times; Step 6: eS^'C, 5 min; Step 7: storage at 4*'C. In the alternative, the 
parameters for primer pair T7 and SK+ are as follows: Step 1: 94**C, 3 min; Step 2: 94°G, 15 sec; 

20 Step 3: ST^'G, 1 nan; Step 4: eS^'G, 2 min; Step 5: Steps 2, 3, and 4 repeated 20 times; Step 6: 68 "'G, 5 
ncdn; Step 7: storage at 4*'G. 

The concentration of DNA in each well is detemined by dispensing 100 [il PIGOGREEN 
quantitation reagent (0.25% (v/v) PIGOGREEN; Molecular Probes, Eugene OR) dissolved in IX TE 
and 0.5 ill of undiluted PGR product into each well of an opaque fhiorimeter plate (Coming Gostar, 

25 Acton MA), allowing the DNA to bind to the reag^it The plate is scanned in a Fluoroskan n 
(Labsystems Oy, Helsinki, Finland) to naeasure the fluorescence of the sancple and to quantify the 
concentration of DNA. A 5 A^l to 10 //I aliquot of the reaction mixture is analyzed by electrophoresis 
on a 1 % agarose gel to determine which reactions are successful in extmding the sequence. 

The extended nucleotides are desalted and concentrated, transferred to 3 84- well plates, 

30 digested with GviJI cholera virus endonuclease (Molecular Biology Research, Madison WI), and 
sonicated or sheared prior to religation into pUG 1 8 vector (Amersham Biosciences). For shotgun 
sequencing, the digested nucleotides are separated on low concentration (0.6 to 0.8%) agarose gels, 
fragments are excised, and agar digested with Agar AGE (Promega). Extended clones were religated 
using T4 hgase (New England Biolabs, Beverly MA) into pUG 18 vector (Amersham Biosciences), 

35 treated with Pfu DNA polymerase (Stratagene) to fiU-in restriction site overhangs, and transfected 

93 



wo 2004/098539 



PCT/US2004/009215 



into coirpetent E, coli cells. TransformBd cells are selected on antibiotic-containmg niedia, and 
individual colonies are picked and cultured overnight at 37 **C in 384-wen plates in LB/2x carb liquid 
media. 

The cells are lysed, and DNA is anqpMed by PGR using Taq DNA polymerase (Amersham 
5 Biosciences) and Pfii DNA polymerase (Stratagene) with the following parametears: Step 1 : 94 ""C, 3 
min; Step 2: 94**C, 15 sec; Step 3: eO'^C, 1 nam; St^ 4: 72'*C, 2 min; Step 5: steps 2, 3, and 4 
repeated 29 times; Step 6: 72**C, 5 min; Step 7: storage at 4**C. DNA is quantified by PICOGREEN 
reagent (Molecular Probes) as described above. Saniples with low DNA recoveries are rean^lified 
using the same conditions as described above. Saii]ples are diluted with 20% dimethysulfoxide (1 :2, 

10 v/v), and sequenced using DYENAMIC energy transfer sequencing primers and the DYENAMIC 
DIRECT Mt (Amersliam Biosciences) or the ABI PRISM BIGDYE Termfaiator cycle sequencing 
ready reaction kit (Applied Biosystems). 

In like manner, full length polynucleotides are verified using the above procedure or are used 
to obtain S' regulatory sequences using the above procedure along with oligonucleotides designed for 

15 sucli extension, and an appropriate genomic library. 

IX. Identificatf on of Single Nucleotide Polymorphisms In KPP Encoding Polynucleotides 

Common DNA sequence variants known as single nucleotide polymorphisms (SNPs) are , 
identified in SEQ ID NO:44-86 using the LBPESEQ database (Incjrte). Sequences from the same 
gene are clustered together and assembled as described in Exan:q)le III, allowing the identification of 

20 all sequence variants in the gene. An algorithm consisting of a series of filters is used to distinguish 
SNPs from other sequence variants. Preliminary filters remove the majority of basecall errors by 
requiring a minimum Phred quality score of 15, and remove sequence alignment errors and enors 
resulting from in^roper trimming of vector sequences, chimeras, and splice variants. An automated 
procedure of advanced chromosome analysis is applied to the original chromatogram files in the 

25 vicinity of flie putative SNP. Clone error filters use statistically generated algorithms to identify 
errors introduced during laboratory processing, such as those caused by reverse transcriptase, 
polymerase, or somatic mutation. Clustering error filters use statistically generated algorithms to 
identify errors resulting from clustering of close homologs or pseudogenes, or due to contamination 
by non-human sequences. A final set of filters removes duplicates and SNPs found in 

30 immunoglobuUns or T-cell receptors. 

Certain SNPs are selected for further characterization by mass spectrometry using the high 
throughput MASS ARRAY system (Sequenom, Inc.) to analyze allele frequencies at the SNP sites in 
four different human populations. The Caucasian population comprises 92 individuals (46 male, 46 
female), including 83 from Utah, four French, three Venezualan, and two Amish individuals. The 

35 Afiican population comprises 194 individuals (97 male, 97 female), all Afiican Americans. Ihe 

94 



wo 2004/098539 



PCT/US2004/009215 



ffispanic population comprises 324 individuals (162 male, 162 female), all Mexican Hispanic. Tte 
Asian population comprises 126 individuals (64 male, 62 female) witli a reported parental breakdown 
of 43% Chinese, 31% Japanese, 13% Korean, 5% Vietnamese, and 8% other Asian Allele 
frequencies are jSrst analyzed in the Caucasian population; in sonae cases those SNPs which show no 
allelic variance in this population are not further tested in the other three populations. 

X. Labeling and Use of Individual Hybridization Probes 

Hybridization probes derived from SEQ ID NO:44-86 are employed to screen cDNAs, 
genomic DNAs, or mRNAs. Although the labelmg of oligonucleotides, consisting of about 20 base 
pairs, is specifically described, essentially the same procedure is used with larger nucleotide 
fi-agmmts. Oligonucleotides are designed using state-of-the-art software such as OUGO 4.06 
software (National Biosciences) and labeled by combiniug 50 pmol of each oligomer, 250 /xCi of 
[Y-32p] adenosine triphosphate (Amersham Biosciences), and'T4 polynucleotide kinase (DuPont NEN, 
Boston MA). The labeled oligonucleotides are substantially purified using a SEPHADEX G-25 
superfine size exclusion dextran bead column (Amersham Biosciences). An aliquot contaming 10^ 
counts per mmate of the labeled probe is used in a typical membrane-based hybridization analysis of 
human genoidc DNA digested with one of the following endonucleases: Ase I, BgJ n, Eco RI, Pst I, 
Xba I, or Pvu n (DuPont NEN). 

The DNA from each digest is fractionated on a 0.7% agarose gel and transferred to NYTRAN 
PLUS nylon nrai*ranes(ScMeicher&SdiueU,DurhamNH). Hybridization is earned out for 16 
hours at 40**C. To remove nonspecific signals, blots are sequmtially washed at room ten5)erature 
under conditions of up to, for example, 0. 1 x saline sodium citrate and 0.5% sodium dodecyl sulfate. 
Hybridization patterns are visualized usmg autoradiography or an alternative imaging Tnaaiig and 
conopared. 

XI. Mfcroarrays 

The linkage or synthesis of array elements upon a microarray can be achieved utilizing 
photolithography, piezoelectric printing (ink-jet printing; see, e.g., Baldeschweiler et al., supra), 
mechanical microspotdng technologies, and derivatives thereof. The substrate in each of the 
aforementioned technologies should be uniform and solid with a non-porous surface (Schena, M., ed. 
(1999) DNAMicroarravs: A P ractical Approach. Oxford University Press, London). Suggested 
substrates include silicon, silica, glass slides, glass chips, and silicon wafers. Alternatively, a 
procedure analogous to a dot or slot blot may also be used to arrange and link elements to the surface 
of a substrate using thermal, UV, chemical, or mechanical bonding procedures. A typical array may 
be produced using available methods and machines well known to those of ordinary skill in the art 
and may contain any appropriate number of elements (Schena, M. et al. (1995) Science 270:467-470; 
Shalon, D. et al. (1996) Genome Res. 6:639-645; Marshall, A- and J. Hodgson (1998) Nat 



95 



wo 2004/098539 



PCT/US2004/009215 



BioteclmoL 16:27-31). 

Pull length cDNAs, Expressed Sequence Tags (ESTs), or fragments or oligomers tliereof may 
conq>rise the elements of the nncroarray. Fragmmts or oligomers suitable for hybridization can be 
selected using software well laiown in the art such as LASERGENE software (DNASTAR). The 
5 array elements are hybridized with polynucleotides in a biologic£d sanqple. The polynucleotides in 
the biological sanq>le are conjugated to a fluorescent label or other molecular tag for ease of 
detection. Aft^ hybridization, nonhybridized nucleotides from the biological sanple are removed, 
and a fluorescence scanner is used to detect hybridization at each array element. Alternatively, laser 
desorbtion and mass spectrometry may be used for detection of hybridization. The degree of 

10 conplementarity and the relative abundance of each polynucleotide which hybridizes to an elemimt 
on the nucroarray may be assessed. In one enbodiment, microarray preparation and usage is 
described in detail below. 
Tissue or Cell Sample Preparation 

Total RNA is isolated fiqom tissue saniples using the guanidinium Ihiocyanate method and 

15 poly(A)* RNA is purified using the oligo-(dT) cellulose method. Each poly(A)* RNA sample is 
reverse transcribed using MMLV reverse-transcriptase, 0.05 pg//il oligo-(dT) primer (21mer), IX 
first strand buffer, 0.03 units//d RNase inhibitor, 500 fiM dATP, 500 /iM dGTP, 500 fiM dTTP, 40 
fiM dCTP, 40 /iM dCTP-Cy3 (BDS) or dCTP-Cy5 (Amersham Biosciences). The reverse 
transcription reaction is i>erformed in a 25 ml volume containing 200 ng poly(A)^ RNA with 

20 GEMBRIGHT kits (Incyte). Specific control poly(A)* RNAs are synthesized by in vitro transcription 
from non-coding yeast genomic DNA. After incubation at 37^*0 for 2 hr, each reaction sample (one 
with Cy3 and another with Cy5 labeling) is treated with 2.5 ml of 0.5M sodium hydroxide and 
incubated for 20 minutes at 85° C to the stop the reaction and degrade the RNA. Samples are purified 
using two successive CHROMA SPIN 30 gel filtration spin columns (BD Clontech, Palo Alto CA) 

25 and after combining, both reaction samples are ethanol precipitated using 1 ml of glycogen (1 

mg/ml), 60 ml sodium acetate, and 300 ml of 100% ethanol. The sample is then dried to completion 
using a SpeedVAC (Savant Instruments Inc., Holbrook NY) and resuspended in 14 /il 5X SSC/0.2% 
SDS. 

Microarray Preparation 

30 Sequences of the present invention are used to generate array elements. Each array element 

is amplified from bacterial cells containing vectors with cloned cDNA inserts. PGR an^lification 
uses primers complementary to the vector sequences flanking the cDNA insert Array elements are 
amplified in thirty cycles of PGR from an initial quantitjr of 1-2 ng to a final quantity greater than 5 
fig. Amplified array elements are then purified usmg SEPHACRYL-400 (Ameisham Biosciences). 

J5 Purified array elements are immobilized on polymer-coated glass slides. Glass microscope 



96 



wo 2004/098539 



PCT/US2004/009215 



slides (Coming) are cleaned by ultrasound in 0.1% SDS and acetone, with extensive distilled water 
washes between and after treatments. Glass slides are etched in 4% hydrofluoric acid (VWR 
Scientific Products Corporation (VWR), West Chester PA), washed extensively in distilled water, 
and coated with 0.05% aminopropyl silane (Sigma-Aldrich, St Louis MO) in 95% ethanol. Coated 

5 slides are cured in a 1 lO^C oven. 

Array elements are applied to the coated glass substrate using a procedure described in U.S. 
Patent No. 5,807^22, incorporated herein by reference. 1 fil of the array element DNA, at an average 
concentration of 100 ng/^, is loaded into the open capillary printing element by a high-speed robotic 
apparatus. The apparatus then deposits about 5 nl of array element sample per slide. 

10 Microarrays are UV-crosslinked using a STRATALINKER UV-crosslinker (Stratagene). 

Microarrays are washed at room tenq)erature once in 0.2% SDS and three times in distilled water. 
Non-specific binding sites are blocked by incubation of microarrays in 0.2% casein in phosphate 
buffered saline (PBS) (Tropix, Inc., Bedford MA) for 30 minutes at eO^'C followed by washes in 
0.2% SDS and distilled water as before. 

15 Hybridization 

Hybridization reactions contain 9 jul of sanqple mixture consisting of 0.2 fig each of Cy3 and 
Cy5 labeled cDNA synthesis products in 5X SSC, 0.2% SDS hybridization bufC&r. The sainple 
mixture is heated to 65° C for 5 rcdnutes and is aliquoted onto the microarray surface and covered 
with an 1.8 cm^ coverslip. The arrays are transferred to a waterproof chamiber having a cavity just 

20 slightly larger than a microscope slide. The chamber is kept at 100% humidity internally by the 
addition of 140 ill of 5X SSC in a comer of the chamber. The chamber containing the arrays is 
incubated for about 6.5 hours at 60° C. The arrays are washed for 10 min at 45° C in a first wash 
buffer (IX SSC, 0.1% SDS), three times for 10 minutes each at 45°C in a second wash buffer (O.IX 
SSC), and dried. 

25 Detection 

Reporter-labeled hybridization complexes are detected with a microscope equipped with an 
Innova 70 mixed gas 10 W laser (Coherent, Inc., Santa Clara CA) capable of generating spectral lines 
at 488 imi for excitation of Cy3 and at 632 mn for excitation of Cy5. The excitation laser light is 
focused on the array using a 20X microscope objective (Nikon, Inc., Melville NY). The slide 

30 containing the array is placed on a computer-controlled X-Y stage on the microscope and raster- 

scaimed past the objective. Ihe 1.8 cm x 1 .8 cm array used in the present exaiqple is scanned with a 
resolution of 20 micrometers. 

In two separate scans, a mixed gas multiline laser excites the two fluorophores sequentially. 
Emitted light is split, based on wavelength, into two photomultiplier tube detectors (PMT R1477, 

35 Hamamatsu Photonics Systems, Bridgewater NJ) corresponding to the two fluorophores. 

97 



wo 2004/098539 PCT/US2004/009215 

Appropriate filters positioned between the array and the photomultiplier tabes are used to filter the 
signals. The emission maxima of flie fluorophores nsed are 565 nm for Cy3 and 650 nm fi)r Cy5. 
Each array is typically scanned twice, one scan per fluoropbore using the appropriate filters at the 
laser source, although the apparatus is capable of recording the spectra fin>mbofh fluompbores 
5 simultaneously. 

Hie sensitivity of the scans is typically calibrated using the signal intensity generated by a 
cDNA control species added to the sanq>le mixture at a known concentration. A specific location on 
the array contains a concplementary DNA sequence, allowing the intensity of the signal at that 
location to be correlated with a weight ratio of hybridizing species of 1:100,000. When two samples 

10 from different sources (e.g., representing test and control cells), each labeled wifli a different 
fluorophore, are hybridized to a single array for the purpose of identifying genes fliat are 
differentially expressed, the calibration is done by labeling samples of the calibrating cDNA with the 
two fluorophores and adding identical amounts of each to the hybridization mixture. 

The output of the photomultiplier tube is digitized using a 12-bit RTI-835H analog-to-digital 

15 (A/D) conversion board (Analog Devices, Inc., Norwood MA) installed in an IBM-compatible PC 
computer. Ibe digitized data are displayed as an image where the signal iatensity is mapped using a 
linear 20-color transformation to a pseudocolor scale ranging from blue (low signal) to red (high 
signal). The data is also analyzed quantitatively. Where two different fluorophores are excited and 
measured simultaneously, the data are first corrected for optical crosstalk (due to overlapping 

20 emission spectra) between ftie fluorophores using each fluorophore' s emission spectrum. 

A grid is superimposed over the fluorescence signal image such that the signal from each 
spot is centered in each element of the grid. The fluorescence signal within each element is then 
integrated to obtain a numerical value corresponding to the average intensity of the signal. The 
software used for signal analysis is the GEMTOOLS gene expression analysis program (Incyte). 

25 Array elements that exhibit at least about a two-fold change in expression, a signal-to-background 
ratio of at least about 2.5, and an element spot size of at least about 40%, are considered to be 
differentially expressed* 
Expression 

For example, SEQ ID NO:51, SEQ ID NO:53-54, and SEQ ID NO:57 were differentially 
30 expressed in breast carcinoma cell lines versus a cell line derived from normal breast epithelial tissue 
as determined by microarray analysis. Gene expression profiles of nonmalignant mamm ary epithelial 
cells were conpared to gene expression profiles of various breast carcinoma lines at^different stages 
of tumor progression. The cells were grown in defined serun>free H14 medium to 70-80% 
confluence prior to RNA harvest Cell lines con^ared included: a) HMEC, a primary breast 
35 epithelial cell line isolated fix)m a normal donor, b) MCF-lOA, a breast mammar y ^and cell line 



98 



wo 2004/098539 PCT/US2004/009215 

isolated from a 36-year-old woman with fibrocystic breast disease, c) MCF7. a nonmalignant breast 
adenocarciiiDma cell line isolated from the pleural effusion of a 69-year-old female, d) T-47D, a 
breast carcinoma cell line isolated from a pleural effusion obtained from a 54-year-old female with an 
infiltrating ductal carcinoma of the breast, e) Sk-BR-3. a breast adenocardnoma cell line isolated 
5 from a maUgnant pleural effusion of a 43-year-old female, f) BT-20, a breast carcinoma ceJl line 
derived in vitro from cells emigrating out of thin sKces of the tumor mass isolated from a 74-year-old 
female, g) MDA-mb-231 , a breast tunmr cdl line isolated from the pleural ^fusion of a 51-year-oId 
female, and h) MDA-mb-435S, a spindle-shaped strain that evolved from the parent line (435) 
isolated by R. Cailleau from pleural effusion of a 3 1-year-old female with metastatic, ductal 

10 adenocarcinoma of the breast. Expression of SEQ ID NO:53 was increased at least two-fold in MCF7 
cells, versus HMECs. In a similar experiment, expression of SEQ ID NO:51 was decreased at least 
two-fold in Sk-BR-3 cells versus HMECs. In a similar experiment, expression of SEQ ID NO:54 was 
decreased at least two-fold in Sk-BR-3, T-47D, and MCF7 cells versus HMECs. In a similar 
experiment, expression of SEQ ID NO:57 was decreased at least two-fold in MDA-mb-23 1 and MCF- 

15 lOA cells versus HMECs. Therefore, in various aiibodin»nts, SEQ ID NO:51, SEQ ID NO:53-54, 
and SEQ ID NO:57 canbe used for one or more of the followmg: i) monitoring treatment of breast 
cancer, ii) diagnostic assays for breast cancer, and iii) deveJoping fli«apeutics and/or other treatments 
for breast cancer. 

In another example, SEQ ID NO:45, SEQ ID NO:51, SEQ ID NO:53-54, and SEQ ID NO:57 

20 were differentiany expressed inbreast carcinoma cdl lines VCTsus a ccffl line derived fi»m a don^ 
with non-malignant, fibrocjretic breast disease as determined by microarray analjreis. Gene 
expression proffles of n onmali g n a nt mammary q)ifhdial cefls were conpaied to gene expression 
profiles of various breast carcinoma lines at different stages of tumor progression. The cells were 
grown in defined serumrfcee TCH medium, defined seaimirftee H14 medmm, or flie suppKer's 
recQmmended medhim to 70-80% confluence prior to RNA harvest and compared to MCF-lOA cells 
grown in the same medmm. Cdl lines compared included: a) MCF-lOA, a breast mammary gland 
(hmiinal ductal characteristics) cdl line isolated from a 36-year-old woman with fibrocystic breast 
disease; b) MCF7, a nonmali g nant breast adenocarcinoma cell line isolated from the pleural efiusion 
of a 69-year-old female, c) T-47D, a breast caronoma cdl line isolated from a pleural effiision 
JO obtained firom a 54-year-old female with an infiltrating ductal carcinoma of the breast, d) Sk-BR-3 , a 
breast adenocananoma cell line isolated from a maHgnant pleural effusion of a 43-year-old female, e) 
BT-20, a breast cardnoma cdl line derived in vitro from flie cells emigrating out of thin sUces of the 
tumor mass isolated froma74-year-old female, f) MDA-mb-231, a breast tumor ceU line isolated 
from the pleural effusion of a 51-year old female, and g) MDA-mb-435S, a spindle shaped strain that 
5 evolved from the parent line (435) isolated from the pleural effusion of a 31-year-old female with 



25 



99 



wo 2004/098539 PCT/US2004/009215 

mstastatiic, ductal adeaocarcinDma of the breast Expression of SEQ JD NO:45 was increased at least 
two-fold in MCF7 cells when grown in either the defined serun>-free H14 medium or the supplier' s 
recommended medium as conspaied with MCF-lOA cells grown under the same conditions. In a 
sirmlar experiment, expression of SEQ ID NO:51 was decreased at least two-fold in Sk-BR-3 cells 
when grown in any of the growth conditions as conopared with MCF-lOA cells grown \mder the same 
conditions. In a simlar experiment, expression of SEQ ID NO:53 was increased at least two-fold in 
MCF7 cells when grown in any of the growth conditions as compared with MCF-lOA cells grown 
under the same conditions. In a similar experiment, expression of SEQ ID NO:54 was decreased at 
least two-fold in Sk-BR-3 cells and T-47D cells when grown in auy of the growth conditions as 
compared with MCF-lOA cells grown under the same conditions. In a sinrilar experiment, expression 
of SEQ ID NO:57 was increased at least two-fold in MDA-mb-23 1 cells when grown in either the 
defined seatraibfiree H14 medium or the supplier's recommended medhmi as coiqpared with MCF-lOA 
cells grown under the same conditions. Therefore, in various erdbodimeots, SEQ ID NO:45, SEQ ID 
NO:51, SEQ ID NO:53-54, and SEQ ID NO:57 can be used for one or more of the following: i) 
monitoring treatment of breast cancer, ii) diagnostic assays for breast cancer, and iii) developing 
therapeutics and/or other treatments for breast cancer. 

In another example, expression of SEQ ID NO:47 was down-regulated in a breast cancer ceJl 
line (MCF7) treated with TNFa versus imtreated MCF7 cells as detemrined by nmcroarray analysis. 
MCF7 cells were treated wifli 10 ng/mL TNFa for 1, 4, 8, 12, 24, 48, and 72 hours. Treated cells 
were compared to imtreated cells kept in culture for the same amount of time. Expression of SEQ ID 
NO:47 was decreased at least two-fold in MCF7 cells treated with 10 ng/niL TNFa for 4, 8, 24, or 48 
hours as compared with untreated MCF7 cells. Therefore, in various embodiments, SEQ ID NO:47 
can be used for one or more of the following: i) monitoring treatment of breast cancer, ii) diagnostic 
assays for breast cancer, and iii) developing therapeutics and/or other treatments for breast cancer. 

In another exan^le, expression of SEQ ID NO:51 was down-regulated in ovary tumor tissue 
versus normal ovary tissue as determmed by microarray analysis. Expression of SEQ ID NO:51 was 
decreased at least two-fold in ovary tumor tissue as con5)ared with matched normal ovary tissue from 
the same donor in 1 of 2 donors tested. Therefore, in various embodiments, SEQ ID NO:31 can be 
used for one or more of the following: i) monitoring treatment of ovarian cancer, ii) diagnostic assays 
for ovarian cancer, and iii) developing therapeutics and/or other treatments for ovarian cancer. 

In another example, expression of SEQ ID NO:54 was down-regulated in brain tissue firom 
donors with Alzheimer's disease (AD) versus brain tissue from a normal donor as determined by 
nncroarray analysis. Specific dissected brain regions from the cerebellum, dentate nucleus, and 
vermis of a normal donor were compared to: a) the corresponding regions dissected from the brain of 
a female with imld AD; and b) the corresponding regions dissected from the brain of a female with 



100 



wo 2004/098539 PCTAJS2004/009215 

severe AD. The diagnosis of normal or ndld AD was established by a certified Qjeuropathologist 
based on ndcroscopic examination of multiple sections throughout the brain. Expression of SEQ ID 
NO:54 was decreased at least two-fold in the striatum and globus pallidus region of the brain of a 
donor with severe AD and a donor with mild AD as con^>ared with the corresponding region of the 
5 brain from a normal donor. Therefore, in various embodim^its, SEQ JD NO:54 can be used for one 
or more of the following: i) noonitoring treatment of AD, ii) diagnostic assays for AD, and iii) 
developing therapeutics and/or other treatments for AD. 

In another exan^le, expression of SEQ ED NO:57 was up-regulated in lung tumor tissue 
versus normal lung tissue as detemmied by nncroarray analysis. Expression of SEQ ID NO:S7 was 

10 increased at least two-fold in limg tumor tissue as conqpared with matched normal lung tissue from 
the same donor in 3 of 4 donors tested. Therefore, in various embodiments, SEQ ID NO:57 can be 
used for one or more of the following: i) monitoring treatment of lung cancer, ii) diagnostic assays for 
hing cancer, and iii) developing therapeutics and/or other treatments for lung cancer. 

In another exanople, expression of SEQ ID NO:57 was down-regulated to a lesser extent in 

IS preadipocytes taken from an obese donor versus preadipocytes taken from a non-obese donor as 
determined by microarray analysis. Primary subcutaneous preadipocytes were isolated from the 
adipose tissue of a non-obese donor, a 28-year-old healthy female with body mass index (BMI) of 
23.59, and an obese donor, a 40-year-old healthy female with a body mass index (BMI) of 32.47. 
The preadipocjrtes from each donor were cultured and induced to differentiate into adipocytes by 

20 growing them in differentiation medium containing PPAR- y agonist and haunan insulin (Zen-Bio). 
Some thiazolidinediones or PPAR-y agonists, which bind and activate an orphan nuclear receptor, 
PPAR-Y, have been shown to induce human adipocyte differentiation. Hie preadipocytes were 
treated with human insulin and PPAR-y agonist for 3 days and subsequently were switched to 
medium containing insulin for a range of time periods ranging from one to 20 days before the cells 

25 were collected for analysis. Differentiated adipocytes from each donor were compared to untreated 
preadipocytes, maintained in culture in the absence of differentiation-inducing agents, from the same 
donor. Between 80% and 90% of the preadipocytes finally differentiated to adipocytes as observed 
under phase contrast microscopy. Expression of SEQ ID NO:57 was decreased at least two-fold in 
differentiated preadipocj^es from a normal donor versus non-differentiated preadipocytes from the 

30 same donor. In contrast, no differential expression was seen in differentiated preadipocytes from an 
obese donor versus non-differentiated preadipocytes from the same donor. These data suggest that 
SEQ ID NO:57 is differentially expressed in adipocytes from normal subjects but not in adipocytes 
from obese subjects. Therefore, in various embodiments, SEQ ID NO:S7 can be used for one or more 
of the following: i) monitoring treatment of diabetes mellitus and other disorders, such as obesity and 

35 hypertension ii) diagnostic assays for diabetes mellitas and other disorders, such as obesity and 



101 



wo 2004/098539 



PCT/US2004/009215 



hypertension iii) developing therapeutics and/or other treatments for diabetes mellitus and other 
disorders, such as obesity and hypertension. 

In another example, SEQ ID NO:47, SEQ ID NO:54, and SEQ ID NO:56 showed tissue- 
specific expression as detenruned by microarray analysis. RNA sancples isolated jfrom a variety of 
5 normal hiiman tissues were cont^ared to a comnu>n reference sanq)le. Tissues contributing to the 
reference sanople were selected for their ability to provide a complete distribution of RNA in the 
human bodjr and inchide brain (4%), heart (7%), kidney (3%), lung (8%), placenta (46%), smaH 
mtestine (9%), spleen (3%), stomach (6%), testis (9%), and uterus (5%). The normal tissues assayed 
were obtained from at least three different donors. RNA from each donor was separately isolated and 

10 individually hybridized to the nncroarray. Since these hybridization experiments were conducted 
using a common reference sample, differential expression values are directly con9)arable from one 
tissue to another. The expression of SEQ ID NO:47 was increased by at least two-fold in small 
intestine and hver as compared to the reference san?)le. Therefore, SEQ ID NO:47 can be used as a 
tissue marker for small intestine and liver. The expression of SEQ ID NO:54 was increased by at 

15 least two-fold in brain (tenqioral cortex) and leukoc^rtes as conqiared to the reference san^le. 

Therefore, SEQ ID NO:54 can be used as a tissue marker for brain (tenq}oral cortex) and leukocytes. 
The expression of SEQ ID NO:56 was increased by at least two-fold in brain as conq)ared to the 
reference sanople. Therefore, SEQ ID NO:56 can be used as a tissue marker for brain. 

In another exan5)le, SEQ ID NO:44 showed tissue-specific expression as detemnned by 

20 microarray analysis. RNA samples isolated from a variety of normal human tissues were conq)ared to 
a common reference san^le. Tissues contributing to the reference sancple were selected for their 
ability to provide a con5)lete distribution of RNA in the hunaan body and include brain (4%), heart 
(7%), kidney (3%), lung (8%), placenta (46%), smafl intestine (9%), spleen (3%), stomach (6%), 
testis (9%), and uterus (5%). The normal tissues assayed were obtained from at least three different 

25 donors. RNA from each donor was separately isolated and individually hybridized to the microarray. 
Since these hybridization experiments were conducted usiag a common reference sarnple, differential 
expression values are directly conqjarable from one tissue to another. The expression of SEQ ID 
NO:44 was increased by at least two-fold in leukocytes, thymus gland, and tonsil as compared to the 
reference sample. Therefore, SEQ ID NO:44 can be used as a tissue marker for leukopytes, thymus 

30 gland, and tonsil. 

In another example, SEQ ID NO:48-50 showed tissue-specific expression as determmed by 
microarray analysis. RNA samples isolated fi-om a variety of normal human tissues were compared to 
a common reference san^le. Tissues contributing to the reference sample were selected for their 
ability to provide a con^lete distribution of RNA in the human body and include brain (4%), heart 
35 (7%), kidney (3%), lung (8%), placenta (46%), small intestine (9%), spleen (3%), stomach (6%), 

102 



wo 2004/098539 PCT/US2004/009215 

testis (9%), and uterus (5%). The normal tissues assayed were obtained from at least three different 
donors. RNA from each donor was s^arately isolated and individually hybridized to the microarray. 
Since these hybridization experiments were conducted using a common reference san5)le, differential 
expression values are directly comparable from one tissue to another. The expression of SEQ ID 
5 NO:48-50 was increased by at least two-fold in muscle, adipose tissue, and liver as compared to the 
refer^ce sanc^le. Therefore, SEQ ID NO:48-50 can be used as a tissue marker for muscle, adipose 
tissue, and Uver. 

In another exarr^ile, expression of SEQ ID NO:62 was up-regulated in breast cancer cell lines 
versus a breast epithelial cell Ihie derived from normal breast tissue as detercmned by microarray 

10 analysis. Gene expression profOies of nonmalignant mammary epiHielial cells were compared to gene 
expression profiles of various breast carcinoma lines at different stages of tumor progression. The 
cells were grown in defined serumrfiree H14 medium to 70-80% confluence prior to RNA harvest. 
Cell lines conopared included: a) HMEC, a primary breast epithelial cell line isolated from a normal 
donor, b) MCF-lOA, a breast mammary gland cell line isolated fcom a 36-year-old woman with 

15 fibrocystic breast disease, c) MCP7, a nonmalignant breast adenocarcinoma cell line isolated from the 
pleural effusion of a 69-year-old female, d) T-47D, a breast carcinoma cell line isolated from a 
pleural effusion obtained from a 54-year-old female with an infiltrating ductal carcinoma of the 
breast, e) Sk-BR-3, a breast adenocarcinoma cell line isolated fi:om a malignant pleural effusion of a 
43-year-old female, f) BT-20, a breast carcinoma cell line derived in vitro from cells emigrating out 

20 of thin slices of the tumor mass isolated from a 74-year-old female, g) MDA-mb-23 1 , a breast txamr 
ceU line isolated from the pleural effusion of a 51-year-old female, and h) MDA-ii]b-435S, a spindle- 
shaped strain that evolved from the parent line (435) isolated by R. CaiQeau from pleural effusion of 
a 31-year-old female with metastatic, ductal adenocarcinoma of the breast. Expression of SEQ ID 
NO:62 was increased at least two-fold in two (MDA-nab-231 and MCF-lOA) of seven breast cancer 

25 cell lines tested compared to HMECs. Therefore, in various embodiments, SEQ ED NO:62 can be 
used for one or more of the following: i) monitoring treatment of breast cancer, ii) diagnostic assays 
for breast cancer, and iii) developing therapeutics and/or other treatments for breast cancer. 

In another exanqile, expression of SEQ ID NO:62 was up-regulated in hmg cancer tissue 
versus normal lung tissue as determined by microarray analysis. Expression of SEQ ID NO:62 was 

30 increased at least two-fold in lung tunoor tissue versus matched normal lung tissue from the same 
donor in three of three donors with squamous cell cancer tested. Therefore, in various enibodimu^ts, 
SEQ ID NO:62 can be used for one or more of the following: i) monitoring treatment of lung cancer, 
ii) diagnostic assays for lung cancer, and iii) developing therapeutics and/or other treatments for lung 
cancer. 

35 In another example, expression of SEQ ID NO:62 was down-regulated to a lesser extent in 



103 



wo 2004/098539 PCT/US2004/009215 

pieadipocytes taken from an obese donor versus pxeadipocytes taken fiom a non-obese donor as 
determined by nucroarray analysis. Primary subcutaneous pieadipocytes were isolated fiomflie 
adipose tissue of a non-obese donor, a 28-year-old healthy female with body mass index (BMT) of 
23.59, and an obese donor, a 40-year-old healthy female with a body mass index (BMI) of 32.47. 
5 The preadipocytes from each donor were cultured and induced to differentiate into adipocytes by 
growing them in differentiation medium containing PPAR- y agonist and hunoan insulin (Zen-Bio). 
Some thiazolidinediones or PPAR-y agonists, which bind and activate an orphan nuclear receptor, 
PPAR-y, have been shown to induce human adipocyte differentiation. The preadipocytes were 
treated with human insulin and PPAR-y agonist for 3 days and subsequently were switched to 

10 medium containing insulin for a range of time periods ranging from one to 20 days before the cells 
were collected for analysis. Differentiated adipocytes from each donor were compared to untreated 
preadipocytes, maintained in culture in the absence of differentiation-inducing agents, from the same 
donor. Between 80% and 90% of the preadipocytes finally differentiated to adipocytes as observed 
under phase contrast microscopy. Expression of SEQ ID NO:62 was decreased at least two-fold in 

15 differentiated preadipocytes from a normal donor versus non-differentiated preadipocytes fi*om the 
same donor. In contrast, no differential expression was seen in differentiated preadipocytes from an 
obese donor versus non-differentiated preadipocytes from the same donor. These data suggest that 
SEQ ID NO:62 is differentially expressed in adipocytes from normal subjects but not in adipocytes 
from obese subjects. Therefore, in various embodiments, SEQ ID NO:62 can be used for one or more 

20 of the following: i) monitoring treatment of diabetes mellitus and other disorders, such as obesity and 
hypertension ii) diagnostic assays for diabetes mellitus and other disorders, such as obesity and 
hypertension iii) developing therapeutics and/or other treatments for diabetes mellitus and other 
disorders, such as obesity and hypertension. 

In another exairple, expression of SEQ ED NO:69 was down-regulated in diseased lung tissue 

25 versus normal lung tissue as determined by microarray analysis. Expression of SEQ ID NO:69 was 
decreased at least two-fold in the lung tumor tissue with squamous cell carcinoma as conqjared to 
grossly uninvolved lung tissue from the same donor using a pair comparison experimental design. 
Therefore, in various erdbodinoents, SEQ ID NO:69 can be used for one or more of the following: i) 
monitoring treatment of lung cancer, ii) diagnostic assays for lung cancer, and iii) developing 

30 therapeutics and/or other treatments for lung cancer. 

In another example, expression of SEQ ID NO:74 was downregulated in brain tissue affected 
by Alzheimer's Disease versus normal brain tissue as detemuned by miCToarray analysis. Specific 
dissected brain regions from the brain patients with AD were corcpared to dissected regions from 
normal brain. The diagnosis of normal or AD was established by a certified neuropathologist based 

35 on microscopic examination of multiple sections throughout the brain. Expression of SEQ ID NO:74 



104 



wo 2004/098539 PCT/US2004/009215 

was decreased at least two-fold in 7 of 10 AD-affected tissue samples. Therefore, in various 
enabodiments, SEQ ID NO:74 can be used for one or more of the following: i) monitoring tteatmait 
of Alzheimer's Disease, u) diagnostic assays for AlzJidma's Disease, and iii) developing 
therapeutics and/or otha- treatments for Al2aieimra:'s Disease as detramined by microarray analysis. 

As another example, SEQ ID NO:72 and SEQ ID NO:74 were downregulated in breast cancer 
cells versus nonmaUgnant mammary q»iflidial cells, as determined by microairay analysis. CdQ lines 
compared included: a) MCF-IOA. a breast mammaiy gland (taminal ductal characteristics) cell line 
isolated from a 36-year-old woman with fibrocystic breast disease, b) MCF7, a nonmaKgnant breast 
adenocarcinoma cell line isolated from the pleural effusion of a 69-year-old female, c) BT-20, a 
breast cardnoma cell Ime derived in vitro frcan the cdls emigrating out of fhm slices of tumor mass 
isolated from a 74-year-old female, d) T-47D, a breast caranoma cdfl line isolated from a pleural 
effusion obtained from a 54-year-old female with an infiltrating ductal carcinoma of the breast, e) Sk- 
BR-3, a breast adeaocardnoma cell line isolated from a malignaTit pleural ^fusion of a 43-year-old 
.female, f) MDA-mb-231 , a breast tumor cell line isolated fmm the pleural effusion of a 51-year-old 
female, g) MDA-mb-435S, a spindle-shaped strain that evolved from the parent line (435) isolated by 
R. Cailleau frompleural effusion of a 31-year-old female withmetastotic, ductal adenocarcinoma of 
the breast, and h) HMEC, a primary breast epithet cell line isolated from a nonnal donor. 
Expression of SEQ ID NO:72 was decreased at least two-fold in the Sk-BR-3, BT-20, MDA-mb- 
435S, T-47D, and MCF7 cell lines as compared to the normal breast epitheUal cells. Expression of 
SEQ ID NO:74 was decreased at least two-fold in the MCF-IOA, T-47D, Sk-BR-3, and MCF7 ceU 
lines as compared to the nonnal breast eplflidial cdls. Therefore, in various eanbodiments, SEQ ID 
NO:72 and SEQ ID NO:74 can be used for one or more of flie following: i) monitoring tteatmesnt of 
breast canccsr, n) diagnostic assays for breast cancer, and iii) developing therapeutics and/or other 
treatments for breast cancea* as detemuned by microarray analysis. 

As another example, SEQ ID NO:74 and SEQ ID NO:77 showed tissue-specific expression as 
determined by microarray analysis. RNA samples isolated from a variety of normal human tissues 
w«e compared to a common refeence sample. Tissues contributing to flie reference sample were 
sdected for fhdr ability to provide a complete distribution of RNA in the human body and include 
brain (4%), heart (7%), kidney (3%), tang (8%), placenta (46%), small intestine (9%), spleen (3%), 
stomach (6%), testis (9%), and uteres (5%). The normal tissues assayed were obtained from at least 
three different donors. RNA from each donor was separately isolated and individuaUy hybridized to 
the microarray. Since these hybridization experiments were conducted using a common reference 
sample, diffiarentitd expression values are directly comparable from one tissue to another. The 
expression of SEQ ID NO:74 was increased by at least two-fold in brain cortex tissue as compared to 
flie reference sample. Therefore, SEQ ID NO:74 can be used as a tissue marker for brain cortex 



los 



wo 2004/098539 



PCT/US2004/009215 



tissue. The expression of SEQ ID NO:77 was increased by at least two-fold in heart tissue as 
compared to the reference saniple. Hierefore, SEQ ID NO:77 can be used as a tissue marter for heart 
tissue. 

Xn* Complementary Polynucleotides 

5 Sequences con^plementary to the KPP-encoding sequences, or any parts thereof, are used to 

detect, decrease, or inhibit expression of naturally occurring KPP. Although use of oligonucleotides 
con^risiDg from about 15 to 30 base pairs is described, essentially the same procedure is used with 
smaller or with larger sequence fragments. Appropriate oligonucleotides are designed using OLIGO 
4.06 software (National Biosciences) and the coding sequence of KPP. To inhibit transcription, a 

10 cornplementaiy oligonucleotide is designed from the most unique 5' sequence and used to prevent 
promoter binding to the coding sequence. To inhibit translation, a complementary oligonucleotide is 
designed to prevent riboson[ial binding to the KPP-encoding transcript. 
Xm. E^qiresslon of KPP 

Expression and purification of KPP is achieved using bacterial or virus-based expression 

15 systCTQs. For expression of KPP in bacteria, cDNA is subcloned into an appropriate vector containmg 
an antibiotic resistance gene and an inducible promoter that directs high levels of cDNA transcription. 
Exaii^>les of such promoters include, but are not limited to, the trp-lac (fac) hybrid promoter and the 
T5 or T7 bacteriophage promoter in conjunction with the lac operator regulatory element. 
Recombinant vectors are transformed into suitable bacterial hosts, e.g., BL21(DE3). Antibiotic 

20 resistant bacteria express KPP upon induction with isopropyl beta-D-thiogalactopyranoside (DPTG). 
Expression of KPP in eukaryotic cells is achieved by infecting insect or mammalian cell lines with 
recombinant Autographica calif oniica nuclear polyhedrosis virus (AcMNPV), commojoly Icnown, as 
baculovirus. The nonessential polyhedrin gene of baculovirus is replaced with cDNA encoding KPP 
by dther homologous reconibination or bacterial-mediated transposition involving transfer plasmdd 

25 intermediates. Viral iufisctivity is maintained and the strong polyhedrin promoter drives high levels 
of cDNA transcription. Recombinant baculovirus is used to infect Spodoptera ffugiperda (Sf9) insect 
cells in most cases, or human hepatocytes, in some cases. Infection of the latter requires additional 
genetic modifications to baculovirus (Eugelhard, E.K. et al. (1994) Proc. Nafl. Acad. Sci. USA 
91:3224-3227; Sandig, V. et al. (1996) Hum. GeneTher. 7:1937-1945). 

30 hi most expression systems, KPP is synthesized as a fusion protein with, e.g., glutathione S- 

transferase (GST) or a peptide epitope tag, such as FLAG or 6-His, permitting rapid, single-step, 
afBnity-based purification of recombinant fusion protein from crude cell lysates. GST, a 26- 
kilodalton enzyme from Schistosoma japonicum, enables the purification of fusion proteins on 
immobilized glutathione under conditions that maintain protein activity and antigenicity (Amersham 

35 Biosciences). Following purification, the GST moiety can be proteolytically cleaved from KPP at 

106 



wo 2004/098539 PCT/US2004/009215 

specifically engmeered sites. FLAG, an S-amino acid peptide, enables innroinoaffimty piuification 
using connnsrcially available monoclonal and polyclonal and-FLAG antibodies (Eastman Kodak). 6- 
His, a stretcli of six consecutive bistidine lesidaes, enables purification on metal-cbelate resins 
(QIAGEN)- Methods for protdn expression and purification are discussed in Ausubel et al. (supra^ 
5 cb. 10 and 16). Purified KPP obtained by these methods can be used directly in the assays shown in 
E&anQ>les XVn, XVm, XDC, XX, and XXI, where applicable. 
XIV* Functional Assays 

KPP function is assessed by. expressing the sequences encoding KPP at physiologically 
elevated levels in mammalian ceU culture systems. cDNA is subcloned into a mammalian expression 

10 vector containing a strong promoter that drives high levels of cDNA expression. Vectors of choice 
include PCMV SPORT plasmid (Invitrogen, Carlsbad CA) and PCR3.1 plasmid (Invitrogen), both of 
winch contain the cytomegalovirus pronooter. 5-10 fxg of reconoibinant vector are transiently 
transfected into a human ceU line, for exanqple, an endothelial or hematopoietic cell line, using either 
liposome formulations or electroporation. 1-2 fxg of an additional plasmid containing sequences 

15 encoding a nmlfier protein are co-transfected. Expression of a marker protdn provides a means to 
distinguish transfected cells from nontransfected cells and is a reliable predictor of cDNA expression 
from the recordbinant vector. Marfcrar proteins of choice include, e.g.. Green Fluorescent Protdn 
(GFP; BD Clontecli), CD64, or a CD64-GFP fusion protein. Plow cytometry (FCM), an automated, 
laser optics-based technique, is used to identify transfected cells expressing GFP or CD64-GFP and to 

20 evaluate the apoptotic state of the cells and other cellular properties. FCM detects and quantifies the 
uptake of fluorescent molecules that diagnose events preceding or coincide with cell death. These 
events include changes in nuclear DNA content as measured by staining of DNA ^th propidium 
iodide; changes in cell size and granularity as noeasured by forward light scatter and 90 degree side 
light scatter; down-regulation of DNA synthesis as measured by decrease in bromodeoxjniridine 

25 uptake; alterations in expression of cell surface and intracellular proteins as measured by reactivity 
with specific axitibodies; and alterations in plasma membrane composition as measured by the binding 
of fiuorescein-conjugated Annexin V protdn to the cell surface. Methods in flow cytometry are 
discussed in Ormerod, M.G. (1994; How Cytometry . Oxford, New York NY). 

The influence of KPP on gene expression can be assessed using higUy purified populations 

30 of cells transfected with sequences encoding KPP and either CD64 or CD64-GFP. CD64 and CD64- 
GFP are expressed on the surface of transfected cells and bind to conserved regions of human 
immunoglobulin G (IgG). Transfected cells are efficiently separated from nontransfected cells using 
magnetic beads coated with either human IgG or antibody against CD64 (DYNAL, Lake Success 
NY). mRNA can be purified from the cells using methods well known by those of skill in the art. 

35 ^pression of mRNA encoding KPP and other genes of interest can be analyzed by northem analysis 



107 



wo 2004/098539 



PCT/US2004/009215 



or mtcroarray techniques. 

XV. Production of KPP Specific Antibodies 

KPP substantiafly purified using polyaciylamide gel electrophoresis (PAGE; see, e.g., 
Harrington. M.G. (1990) Methods Enzymol. 182:488-495), or other putificatian techniques, is used to 
inmninize animals (e.g., rabbits, mice, etc.) and to produce antibodies using standard protocols. 

Alternatively, the KPP amino add sequence is analyzed usiAg LASERGENE software 
(DNASTAR) to determine regions of hi^ imnsmogenicity, and a corresponding oligopeptide is 
synthesized and used to raise antibodies by nieanslmovm to those of sldn in flic art Mefliodsfor 
selection of appropriate qiitopes, such as tJiose near ttie C-tenmnus or inhydrophOic regions are wdl 
described in the art (Ausubel et aL, supra, ch. 1 1). 

Typically, oligopeptides of about 15 residues in length are synthesized usirig an ABI 43 1 A 
peptide synthesizer (AppUed Biosystems) using FMOC chemistiy and coupled to KLH (Sigma- 
Aldridi, St Louis MO) by reaction with N-maleimidobeiizo3d-N-liydroxysuccinimide ester (MBS) to 
increase immunogenidty (Ansubd et al., supra). Rabbits are immunized wifli flie oligopq>tide-KLH 
complex in con^jlete Freund's adjuvant Resulting antisera are tested for antipeptide and anti-KPP 
activity by, for exanaple, binding the p^tide or KPP to a substrate, blocking with 1% BSA, reacting 
wifli rabbit antiseara, washing, and reacting with radio-iodinated goat anti-rabbit IgG. 
XVI. Purification of Naturally Occurring KPP Using Spedfic Antibodies 

Naturally occurring or recombinant KPP is substantially purified by immanoaffinity 
diromatography using antibodies specific for KPP. An imrannoaffinity column is constiiicted by 
covaltaitly coupling anti-KPP antibody to an activated chromatographic resin, such as CNBr-activated 
SEPHAROSE (Amersham Biosciences). After fbs coupling, flic resin is blocked and washed 
according to the manufacturer's instructions. 

Media containing KPP are passed over the immanoafiBnity column, and the column is washed 
under conditions fliat aflow flie preferential absorbance of KPP (e.g., high ionic strength buffers in the 
presence of detergent). The column is dnted under conditions that disrupt antibody/KPP binding 
(e.g., a buffer of pH 2 to pH 3, or a high concentration of a chaotrope, such as urea or thiocyanate 
ion), and KPP is collected. 

XVn. Identification of Molecules Which Interact with KPP 

KPP, or biologically active fragments thereof, are labeled wifli »"l Bolton-Hunter reagent 
(Bolton, A.R and W.M Hunter (1973) Biochem. J. 133:529-539). Candidate molecules previously 
arrayed in flie wells of a mnlti-wdH plate are incubated with the labeled KPP. washed, and any weUs 
wifli labded KPP complex are assayed. Data obtained using different concentrations of KPP are used 
to calculate values for flie number, affinity, and association of KPP wifli flie candidate molecules. 

Alternatively, molecules interacting wifli KPP are analyzed using flie yeast two-hybrid 



108 



wo 2004/098539 



PCTAJS2004/009215 



system as described in Fields, S. and O. Song (1989; Nature 340:245-246), or using commerciaUy 
available kits based on tlie two-hybrid system, such as the MATCHMAKER system (BD aontech). 

KPP may also be used in the PATHCALLING process (CuraGen Corp., New Haven CT) 
which employs the yeast two-hybrid system in a high-fhrougtq>ut manner to determine all interactions 
between the proteins encoded by two large libraries of genes (Nandabalan, K. et aL (2000) U.S. 
Patent No. 6,057,101). 
XVm. Demonstration of KPP Activity 

Generally, protein Idnase activity is measured by quantifying the phosphorylation of a protein 
substrate by KPP in the presence of [y-^^P]ATP. KPP is incubated with the protein substrate, 
^^-ATP, and an appropriate kinase buffer. The ^^P incorporated into the substrate is separated from 
free ^^P-ATP by electrophoresis and the incorporated ^^P is coimted using a radioisotope counter. The 
amount of incorporated is proportional to the activity of KPP. A detemmnation of the specific 
amino acid residue phosphorylated is made by phosphoammo acid analysis of the hydrolyzed protein. 

In one alternative, protein kinase activity is measured by quantifying the transfer of gamma 
phosphate from admosine triphosphate (ATP) to a serine, threomne or tyrosine residue in a protein 
substrate. The reaction occurs between a protein kmase sanq>le with a biotinylated peptide substrate 
and gamma ^^P-ATP. Following the reaction, free avidin in solution is added for binding to the 
biotinylated ^^P-peptide product. The binding sample then undergoes a centrifugal ultrafiltration 
process with a membrane which will retain the product-avidin con^lex and allow passage of free 
gamma ^^-ATP. The reservoir of the centrifuged unit containing the ^^P-peptide product as retentate 
is then counted in a scintillafion counter. This procedure allows the assay of any type of protein 
kinase san5)le, depending on the peptide substrate and kinase reaction bufier selected. This assay is 
provided in kit form (ASUA, Affinity Ultraffltration Separation Assay, Transbio Corporation, 
Baltimore MD, U.S. Patent No. 5,869,275). Suggested substrates and their respective enzymes 
inchide but are not limited to: Histone HI (Sigma) and p34'^'^kinase, Annexin I, Angiotensin (Sigma) 
and EGF receptor kinase, Annexin 11 and src kinase, ERKl & ERK2 substrates and MEK, and noyelin 
basic protein and ERK (Pearson, J.D. et al. (1991) Methods Enzymol. 200:62-81). 

In another alternative, protein kinase activity of KPP is demonstrated in an assay containing 
KPP, 50 III of kinase buffer, 1 ^tg substrate, such as myelm basic protein (MBP) or synthetic peptide 
substrates, 1 mM DTT, 10 \xg ATP, and 0.5 ^Ci [y-'^PlATP. The reaction is incubated at 30°C for 30 
minutes and stopped by pipetting onto P81 paper. The unincorporated [y-^^P]ATP is removed by 
washing and the incorporated radioactivity is measured using a scintillation counter. Altematively, 
the reaction is stopped by heating to lOO^^C in the presence of SDS loading buffer and resolved on a 
12% SDS polyacrylamide gel followed by autoradiography. The amount of incorporated ^^P is 
proportional to the activity of KPP. 



109 



wo 2004/098539 



PCT/US2004/009215 



15 



25 



30 



J5 



In yet anoflier alternative, adenylate kinase or guanylate kinase activity of KPP be 
measured by liie incoiporation of «P ftom [y-^^JATP into ADP or GDP using a gamma radioisotope 
counter. KPP, in a kinase buffer, is incubated together wifli liie appropriate nucleotide 
mono-phosphate substrate (AMP or GMP) and ^-labeled ATP as the phosphate donor. Ihe reaction 
5 is incubated at ST'C and terminated by addition of trichloroacetic add. The add extract is 

neutralized and subjected to gel electrophoresis to separate flie mono-, di-, and triphosphonucleotide 
fiactions. The diphosphonudeotide fraction is excised and counted. The radioactivity recovered is 
proportional to the activity of KPP. 

In yet another alternative, oflua- assays for KPP include scintillation proximity assays (SPA), 
10 scintillation plate tedmology and filter binding assays. Useful substrates inchide recombinant 
protdns tagged with gjutafliione transferase, or synthetic peptide substrates tagged with biotin. 
Inhibitors of KPP activity, such as small organic molecules, proteins or peptides, may be identified by 
such assays. 

In another alternative, phosphatase activity of KPP is measured by the hydrolysis of para- 
nitropheayl phosphate (PNPP). KPP is incubated together with PNPP in HEPES buffer pH 7.5, in the 
presence of 0. 1 % p-mercaptoeflianDl at 37 "C for 60 min. The reaction is stopped by the addition of 6 
ml of 10 N NaOH (Diamond, R.H. et aL (1994) Mol. CeU. Biol. 14:3752-62). Alternatively, acid 
phosphatase activity of KPP is demonstrated by incubating KPP-containing extract with 100 ^1 of 10 

mMPNPPin0.1Msodiumdti-ate,pH4.5,and50;ilof40mMNaClat37''Cfor20mm. The 
reaction is stopped by the addition of 0.5 ml of 0.4 M glycine/NaOH. pH 10.4 (Saflig, P. et al. (1997) 
J. BioL Chem 272:18628-18635). The increase in Ught absorbance at 410 nmresulting from the 
hydrolysis of PNPP is measured using a spectrophotometer. The increase in hght absorbance is 
proportional to the activity of KPP in the assay. 

hi the alternative, KPP activity is determined by measuring die amount of phosphate removed 
from a phosphorylated protein substrate. Reactions are performed with 2 or 4 nM KPP in a final 
volume of 30 nl containing 60 mM Tris, pH 7.6, 1 mM EDTA, 1 mM EGTA, 0. 1% 
p-mercaptoelhanol and 10 nM substirate, ^^pjabeled on serine/threonine or tyrosine, as appropriate. 
Reactions are initiated with substrate and incubated at 30° C for 10-15 min. Reactions are quenched 
with 450 fil of 4% (wA^) activated charcoal in 0.6 M HCl, 90 mM Na^PjO,, and 2 mM NaH2P04, then 
centrifuged at 12,000 x ^ for 5 min. Acid-soluble "Pi is quantified by Kquid scintillation counting 
(Sinclair, C. et al. (1999) J. Biol. Chem. 274:23666-23672). 
XIX. Kinase Binding Assay 

Binding of KPP to a FLAG-CD44 cyt fusion protein can be determined by incubating KPP 
with anti-KPP-conjugated imraonoaffinity beads followed by incubating portions of the beads (having 
10-20 ng of protein) with 0.5 ml of a binding buffer (20 mMTris-HCL(pH 7.4), ISOmMNaCl, 0,1% 



20 



110 



wo 2004/098539 PCT/US2004/009215 

bovine serum albumin, and 0.05% Triton X-100) in the presence of »"l-labdled FLAG-CD44cyt 
fusion protein (5,000 cpm/ng protdn ) at 4 "C for 5 hours. Following binding, beads were wasihed 
thoroughly in the binding buffer and the bead-bound radioactivily measured in a scintillation counter 
(Bourguignon, L.Y.W. et al. (2001) J. Biol. Chem 276:7327-7336). The amount of incorporated «P 
5 is proportional to the amount of bound KPP. 

XX. Identification of KPP Inhibitors 

Compounds to be tested are arrayed in the weHs of a 384-well plate in varying concentrations 
along with an appropriate bufGa- and substrate, as described in the assays in Example XVn. KPP 
activity is measured for each well and the ability of each conqwund to inhibit KPP activity can be 
10 determined, as wdl as the dose-response kinetics. This assay coiild also be used to identify molecules 
which enhance KPP activity. 

XXI. Identification of KPP Substrates 

A KPP "substrate-trapping" assay takes advantage of the increased substrate affinity that may 
be conferred by certain mutatLoiis in flie FTP signature sequence of protein tyrosine phosphatases. 

15 KPP bearing these mutations fi>rm a stable coicplex with their substrate; fliis cooqplex may be isolated 
biochfiamcally. Site-directed mutagenesis of invariant residues in flie PTP signature sequence in a 
clone encoding the catalytic domain of KPP is performed using a method standard in the art or a 
commercial Mt, such as the MUTA-GENE kit firomBIO-RAD. For expression of KPP mutants in 
Escherichia coli, DNA fragments contaming flie mutation are exchanged wifli the corresponding 

20 wild-type sequence in an expression vector bearing the sequence eaiooding KPP or a glutathione 
S-transfearase (GST)-KPP fusion protein. KPP mutants are expressed in K coli and purified by 
chromatography. 

The expression vector is transfscted into COSl or 293 cells via calcium phosphate-mediated 
transfi5ctionwilh20^gofCsa-purifiedDNApea-10-cmdishofcellsor8/igper6-cmdish. Forty- 
25 eight hours after transfection. cells are stimulated with 100 ngAnl epidermal growth fector to increase 
tyrosine phosphorylation in cdls, as flie tyrosine kinase EGFR is abundant in COS cells, • Cells are 
lysed in 50 mM TrisHQ, pH 7.5/5 nM EDTA/150 mM NaCyi% Triton X-100/5 mM iodoacetic 
acid/10 mM sodium phosphate/lO mM NaF/5 /tg/ml leupeptin/5 /xg/ml aprotinin/1 mM benzamidine 
(1 ml per 10-cm dish, 0.5 nil per 6-cm dish). KPP is unmnnoprecipitated from lysates wifli an 
appropriate antibody. GST-KPP fusion protems are precipitated wifli glutathione-Sepharose, 4 /ig of 
mAb or 10 Ml of beads respectively p« mg of cell lysate. Complexes can be visuaUzed by PAGE or 
further purified to identify substrate molecules (Flint, AJ. et al. (1997) Proc. Natl. Acad. Sci. USA 
94:1680-1685). 

Various modifications and variations of flie described compositions, mefliods, and systems of 



30 



111 



wo 2004/098539 



PCT/US2004/009215 



fhe invention will be apparent to those skilled in the art without departing from the scope and spirit of 
the invention. It will be appreciated that the invention provides novel and useful proteins, and their 
encoding polynucleotides, which can be used in the drug discovery process, as well as methods for 
usiiig these conopositions for the detection, diagnosis, and treatment of diseases and conditions. 

5 Although the inv^ition has been described in comiection with certain embodimCTts, it should be 
understood that the invention as claimed should not be unduly limited to such specific einbodimeiits. 
Nor should the description of such embodiments be considered exhaustive or limit the invention to 
the precise forms disclosed. Furthermore, elements from one embodiment can be readily reconcibined 
with elemimts from one or more other enobodiments. Such conibinations can form a nuniber of 

10 embodiments within the scope of the invention. It is intended that the scope of the invention be 
defined by the following claims and thdr equivalents. 



112 



wo 2004/098539 



PCT/US2004/009215 



1— f 




113 



wo 2004/098539 



PCTAJS2004/009215 



(3 
O 

U 

■s 



t2 
I 



1 
It 



a 



e 



a 



s*a 



a 



I 



6 



114 



wo 2004/098539 



PCT/US2004/009215 




115 



wo 2004/098539 



PCT/US2004/009215 



H 




116 



wo 2004/098539 



PCT/US2004/009215 




117 



wo 2004/098539 



PCT/US2004/009215 



1—1 




118 



wo 2004/098539 



PCT/US2004/009215 




119 



wo 2004/098539 PCT/US2004/009215 




120 



wo 2004/098539 



PCT/US2004/009215 




121 



wo 2004/098539 



PCT/US2004/00921S 




122 



wo 2004/098539 



PCT/US2004/009215 



I 



o 

t 

i 

o 



I 

s 
•3 



o « 



CM 



o 



.1 

CO 

I 
•a 

•s 



» o s 
3 ca § 



« 8 



S 

CO 

I 



c« .5 



u 
8 

o 

L 

r-l 

Is 

^ 00 
« »o 

S 00 

cd 00 

on ^ 
O 00 

B o 
o 

H -O 
Cd 

V <^ 

B < 

li 



<D ON 



?5 



i 
I 



.52 



SQ O 
O "S 



s 



I 

14 

.E i 

CO 

to Q4 

1 



if 



I O 



o 



o 

CO 



S 



a 



00 

ON 



s 



ON 

00 s 



123 



wo 2004/098539 



PCT/US2004/00921S 



^ 2 



■5 



^ 2 



PL, 



a 

I 



^3 



I 

CO 

o 



i 

CO 



c4 




3 (J 



5^ s 



CO 
OS 

i 



Q 



8 



^1: 
ill 



124 



wo 2004/098539 



PCT/US2004/009215 




125 



wo 2004/098539 



PCT/US2004/009215 



<N 

I 




126 



wo 2004/098539 



PCT/US2004/009215 



i—H 



73 . 3 



.1 ^ § 
« o g 

1 I- 



TO (U 

G > 



3 S 

- I 
3 ^ 
B q 

II 



i S " 



S 



e ca I 
'23 .52 ' 



4 



'p. ^ 



I 



I 



Q ? 



CO 



5 

I 



1 

o o 

CO 



s 

c4 



^9 



o 
o 




1^ 



so 
m 
oo 

■a 



CO 
CO 



8 



I 



Q 
cn 

vo 

CN 

to 



127 



wo 2004/098539 



PCT/US2004/009215 



:73 

.2 8 

On 00 



S2 



-3 



bJQ 



.5 « 



o 
c> 




S3 



1 ^ 



8 

cn 
so 

VO 



OO 
1—1 



8 



OO 



128 



wo 2004/098539 



PCT/US2004/009215 



i 




129 



wo 2004/098539 



PCTAUS2004/009215 




130 



wo 2004/098539 



PCT/US2004/009215 



I 



3 



o 



o 
a 

•I 

.a 
•a 

o 
o 

i 

u 

i 
§ 



3 ON 

•a" 



CO CO 



1 1 



5 a 



•S .s 



S c c 

:sa :a I 

I. 2 

•s. o. ^ 
,2 "C jB 

B s s 



el 



o .5 



o 

I 



■i 

CO 

n 

i 

■n 

B 

I 
1 

t 



•3 

O 



CO 

I 



00 



o. 



o 



3 



••a 

1 



1 

CO 



o 
o 



o 
o 



o 
o 



On 



<§ g !z; 

6 sa 



o 



o 

oo 

s 

o 

•a 



o 



On 
VO 



8 

VO 
ON 

VO 



8 



U 
oo 

ON 



u 

oo 

ON 
f— I 



i 

ON 



8 

oo 

§ 



1 



o 

!z; 



m 



Ti- 
en 



131 



wo 2004/098539 



PCTAJS2004/009215 




132 



wo 2004/098539 



PCT/US2004/00921S 



"I" 

CO 

if 

&4 



bJQ 



o 

CO 
CQ 

C 

c 

I 

I 

I 

i 

a 

I 



p 



g 

CO 

11 



ill 



CO 



CO 



C r> 



CO 



00 

00 



5^ 




s 

1—1 

o 



8 

CO 

CN 
so 
CM 



vo 



8 

00 



CO 

1:6 



133 



wo 2004/098539 



PCT/US2004/009215 



CM 

<L> 




134 



wo 2004/098539 



PCT/US2004/009215 



o 




:§ 

a 
o 

o 
P 



8 -I s § i ^ 

^53 a. o S CO 



a So s ts 

-C3 «3 P S 



ii 



^ s 1 1 



•§111. 

111? 
^ ^ o ^ 

I J l « I 

^- -g, 3 -s 1 8 

» 3 B e § 

:§ g ^ .2 -a 

'53 S S "C 
"fci fli « 



8) il'^ 2 
I 8 t> ^ 



1 . \u 



^ ^ B 

«S ^ O o 

S '5J 13 -Q 



0) 9 a 

CO ,g 




135 



wo 2004/098539 



PCT/US2004/00921S 




136 



wo 2004/098539 



PCT/US2004/009215 



-a 
o 

'3 



•a 
a 

a 

i 

CO 



CO 



§ 

I 



50 

i 



00 



5<J 

I 



CO 



en 
00 



8 

t-H 

00 



CO 

I 



00 



i 



VO 



5 
■a 

I 
<2 



I 



o 

JS 

& 

v6 



I 

ft, 



o 

i 



C3 



I 

00 



2< 



CO 



i 



r- 
00 

OS 

00 

S > 
O ^ 

eg o 
I 



o 

CO 
< 



00 

On 
00 

PL. 

<C 00 

12 

t— J so 

PQ »o 
o\ 
cs 



i 



O 

1 



5« 



O 



o 



so 
00 

0\ 



CO 
CO 

00 



CO 



CO c>i 



CO 

o 



CO 

CO 

B 

CO 

§ 

1 



PLh 

I 



137 



wo 2004/098539 



PCT/US2004/009215 



CO 




138 



wo 2004/098539 



PCTAJS2004/009215 




139 



wo 2004/098539 



PCTAJS2004/009215 




140 



wo 2004/098539 



PCTAJS2004/009215 



cn 

I 




141 



wo 2004/098539 



PCT/US2004/009215 



o 

I 

1 



ed 

I 

P 



I 



•i 
I 



§ 

Vi 

I 

.S 

CO 



8 

00 
< 



lap 



§ 



H 
CO 



§ 

I 



i 



H 



H 



CO 
CO 

VO 

o\ 

CO 



8 



CM 



in 
in 



I 

oo 

o 



oo 

i 



O 

CO 



5 



r— I 
CO 

O 

to 

t— » 
CO 

oo 

CO 



VO 

{2 



CO 

ON 
CA 
OO 



1 



CO 

I 

o 



I 

I 

oo 



3 



P 
o 

IS 



CO 



8 



1 

CO 



CO 



CO 



i 

i 



PQ 



OS 



0? 



a 53 



i 

CO 



si 



i 



5 S a 



i 

PQ 



CO PL, 



(9 



CO S 



142 



wo 2004/098539 PCT/US2004/009215 




143 



wo 2004/098539 



PCT/US2004/009215 




144 



wo 2004/098539 



PCT/US2004/009215 




145 



wo 2004/098539 



PCTAJS2004/009215 




146 



wo 2004/098539 



PCT/US2004/009215 



CO 




147 



wo 2004/098539 



PCT/US2004/009215 




148 



wo 2004/098539 



PCT/US2004/009215 




149 



wo 2004/098539 



PCTAJS2004/009215 



CO 




o 



o 

i 

2 



o 



i 



1 



CO CO 



CO 

cn 
o 

I 

CO 
00 

NO 

I 

m 
oo 



S; 

cn 



so 
oo 
On 

a 
f 
i 



CO 



CO 

CO 
CO 

I 

c 
o 

"Si 

o 

I 

-a 

I 
<£ 



tX4 



CO. 



I 

CO 



CO 

2 



> 

I 

m 
VO 
to 

CO 



■i 

B 

•s 

o 

f 

s 

8 

:2 



4 



ON 

cn 

T— I 

CO 

U 

CO 

o\ 
Q 

> 



•a 

I 

p 



CO 



cd 



o 

I 
1 



I 



I 



»— « 
VO 



8 

1— I 

i 

wo 



150 



wo 2004/098539 



PCTAJS2004/009215 




151 



wo 2004/098539 



PCT/US2004/00921S 




152 



wo 2004/098539 PCT/US2004/009215 



o 

ll 



CO 

1 



CO 



i 

CO 



i 



50 

I 



CO 



fo 

§ 



CO 

H 



CO 



9 

VO 

s 

ffi ^' Eh « 



co 



CO 



00 

VO 



00 VO 

^ d) 

VO ^ 

to »r> 

ON a\ 



»— • ^ u-i «n 



cn en 
CO «o 



O 

I 



'Si 

1^ 



I 

pi; 



S3 

a 

B 



8 

O 

I 

P 



I 



§ 

t 
I 

o 



.9 



CO 



& a 5 

^ ^ 

VO VO 

HH HH 
..1—1 

00 VO ^ 

Ti- «r} 

1 o I 

VO ^ 00 

Ti* m 

»o to 10 

OV ON OS 

«— 1 *— 4 *-H 

50 50 



CO 



i 

ON 
00 



ON 

00 



VO 

vd 

CO 

I 

S 

•I 



o 

3 

-a 

I 



O 
ON 

5o 



P 



CO 



•a 



I 

I 



CO 




lag 



153 



wo 2004/098539 



PCT/US2004/009215 



■3 

^ i 

•a I 
1 



8 

CO 

< 



i 



I 



oo 

CO 



CO 

oo 



CO 



lO 



<N cj 5J? 

I 5 ^ S 

I i ^ i§ 

I CO V CO 

' vi S - 



I 

55 S3 



" S? o 




CM 



i 



00 



CO 



o 

On 
P-. 

2 

o 



I 

I 



I 



oo 

CO 

O 

oo 
to 



1 



:3 



i 



:2 



W5 ' 

5 

o\ 
Q 

CO 

p [ 

<4-i 

S " 

I 

•I,- 
a 

o 

p I 



CO 

i 

CO 



CO. 
□ y oo 

CO ^ 

" s 



CO ^ 



oo 

ON 

c^ 
c^ 
vo 
o 

S 



C7N 
O 
CO 



•o 



8 

CO 
VO 



0* 

CO n 



154 



wo 2004/098539 



PCT/US2004/009215 



09 
O 

•3 

•3 I 
•a ° 



I 
I 



§ 

Q 

i 

CO 

I 

g. 



I 



CO 



I 



CO 
O 



S 



§ 



I 



oo 



CO 





oq, ^ 

S o 



vo 
vo 

CO CO 
I 

58 a S3 

CO CO 

CO S 

00 Si 

in CO 

CO _r r:j 

^. s 
a s 

CO - '-^ 

^ CO '"^ 

2^ CO 



vo 



^. 

vo 

CO 

o 



CO 

O 



03 ^ 

w s S 

O 

r:: oo 
NO o 
»— ( 

CO CO 



CO 



5? ? 

<^ 2 

oo CO 

CO CO 

S P s 

S CO o 



CO 



vo CO 

o 

O CO 



'S <=> s 

^ CO 

J s s 

vo r-H 



S o in 

f-H oo oo 

vo o\ 

•S CO CO 

B 

p vo 0\ 

PL| CO CO 



p 

oo 

oo 
•n 
H 

O vo 

5! 8 

CM »0 

^ :^ 

oo 

f-H oo 

^ oo 

2 

oo" H 
O 

H oo 

S vo 

£o S 



CO. 



CO 



s 



CO 

o 

t 

o 

>> 

5 

1 



VO 

3 



2 



S 



s 



.9 



CO 



is 



CO 



-8 

-a 

ST 



155 



wo 2004/098539 



PCT/US2004/009215 



CO 




156 



wo 2004/098539 



PCT/US2004/009215 



CO 




157 



wo 2004/098539 



PCT/US2004/009215 



0 

t 

C 

4 

c 

•l 


micujr ubcu ATXvuiuuo 

and Databases 


BLAST J>RODOM 


§ 
1 


MOTFS 


MOTIFS 


MOTIFS 


HMMER 


SPSCAN 1 


< 

j 

( 
> 


/3 

D 
-J 

=^ 
■J 


BLASTJPRODOM 


Imotifs 


i 




H 




CO 

n 


PROFILESCAN 1 


BLAST J>RODOM 


Signature Sequences, Domains and Motifs 


PROTEIN REPEAT SIGNAL PRECURSOR PRION GLYCOPROTEIN NUCT ,F, AR GPIANCHOR 
BRAIN MAJOR PD001091: G373-P626, G404-P626, P358-Q601, P349-Q574, P320-S519, P296- 
Q541 


PROTEIN KINASE DOMAIN 
DM00004|P38080|36-309: L52-I304 
DM00004|P40494|23-287: L52-I304 
DM00004|P51954|6-248: L52-I304 
DM00004|P53974|23-288: L52-I304 


Potential Phosphoiylation Sites: S7, SI 15, S224, S235, S3 11, S625, S679, S785, S815, S822, S833, 
S871, S879, T47, T147, T199, T221, T240, T241, T275, T389, T395. T628, T708, T743. T757, 
T829 


Potential Glycosylation Sites: N113. N273, N667, N703, N823, N905 


Serine/Threonine protein kinases active-site signature: I172-L184 


Signal Peptide: M1-G22 


Signal_cleavage: M1-G22 


Serine/threonine dehydratase pyridoxal-phosphate attachment site IPB000634: E95-SI04 


CYCLIN G-ASSOCLMED KINASE TRANSFERASE SERINE/THREONINEPROTEIN 
ATPBINDING HSGAK PD026473: M1-L40 


Potential Phosphorylation Sites: S6, S21, S62, S73, S92, S113 


Protein kinase domain: L40-E315 


DnaJ molecular chaperone homology domain: E1290-S 1351 


Serine/Threonme protein kinases, catalytic domain: L40-A317 


Eukaryotic protein kinase PB000719: Q165-L180, 1240-G250 


Protein kinases signatures and profile: V148-H200 


CYCLIN G-ASSOCIATED KINASE TRANSFERASE SERINEmiREONINEPROTEIN 
ATPBINDINGHSGAK PD039449: A317-N402 


Amino Acid 
Residues 












oo 
*— 1 










1355 












Incyte 

Polypeptide 

ID 












7526196CD1 










7526198CD1 












fag 












cs 
cn 










cn 
cn 













158 



wo 2004/098539 



PCT/US2004/009215 



•3 

•a I 
II 



I 

•9 



•1 



CO 

I 

eg 
'S 

< 8 



W Q 

CO n 



I 

PQ 



§ 
3 




5^ 



i 



i 



JJ3 vd 
^. H 

CO 

o ^« 
^ 1 

CO vn 

r-* O 

CO ^ 
CO 

2p 

CO c> 

S H 
T-H r 

CO 

- ?2 

00 

m H 

CO »n 
* «n 

VO m 

CO 
CO ^ 

CO CO 



ON 



<5i 
VO 



o 

I 

s' 

00 

z 



s 



•43 



CO 

c 
o 



a 

1 



i 



C3 

:3 



o 

ON 



8 

00 
o 

to 



m 



I 



5?^ 



CO 



3B 



159 



wo 2004/098539 



PCT/US2004/009215 



•i 



•a 



O 



i 



5^ 



O 



50 



CO 



i 



g 

c 

•a 

I 



I 

CO 

I 

c3 




f5 



CO 

H 



CO 
CO 



CO 

CO 

00 

CO 

oT 
m 

CO 



CO 



CO 

§ 

o 



1 H 



2 s 



5^ S 



all 




o o 

CD O 

O O 

Q Q 



CO 



CO 

2 



g 

I 

I 



i 

B 
I 

I 



e2 

O 
00 
CO 

yd 

CO 
CO 



CO 

I 
1 

CO 

O 

I 



ON 
00 



U 
m 

15 



vo 

CO 



160 



wo 2004/098539 



PCT/US2004/00921S 



•o 

o 

■3 

•a ° 



i 



O 



o 



8 

00 



i 



a s 



CO 

r— t 

■§ 
H 



CO 



^ 

Ui ^ ^ 
T-i r:! '-J 



73 

CO 




1 

•I 



to 
U 

I 

c 

CO 

'53 



T— I 

u 

*« 

I 

CO 

.s 

2 
p. 

O 




oo 
VO 

CO 

CO 

s 

CO 
oo 

CO 

CO 

CO 

I 

CO 

§ 

I 

& 

o 

£ 

a 
B 

a 



s 

9 

oo 
to 

i 

o 



o 

i 

3 



ON 
OO 

»o 
cn 

s 



lO 
VO 

CO 

H 

vo 
ro 

P 

oo 

CO 



CO 

CO 
CO 

of 

CO 
CO 

CO 

I" 

CO 

^' 

CO 

I 

CO 

a 

■I 

73 

a 
B 

a 



Hi 

1 

73 
I 

CO 



CO 



oo 



8 

oo 

iO 



8 



CO 
lO 



CO 



161 



wo 2004/098539 



PCT/US2004/009215 



o 
•S 

§ 8 

^ S3 

■« -S 

•S § 

^ •71 



CO 



o 

§ 

•-4 



1 



I 



?0 



CO 



3 3 p 

^ ^ 

F-H 1— < 

^. V-^ 

cn 



i 



CO 



.a s 



53 



5? 



CO 




oo oo 



o o o o 

^^^^ 
Q Q Q Q 



CO 



oo 
«o 
CO 

oo* 

CO 
CO 

'i 

CO 

I 
I 

1 

(2 



vo 



^ CO 
ON VO 



in 



162 



wo 2004/098539 



PCTAJS2004/009215 



CO 



Analytical Methods 
and Databases 


MOTffS 


BLIMPSJLOCKS | 


BLASTJDOMO 


IMOTIFS 1 


Signature Sequences, Domains and Motifs 


Serine/Threonine protein kinases active-site signature: VI29-I141 


Eukaryotic protein kinase IPB000719: HI 19-Q134 


PROTEIN KINASE DOMAIN 
DM00004|I49592|6-276: L7-R13I 
DM00004|P23437|6-286: R9-R131 
DM00004|P29620|21-289: 110-P130 
DM00004|Q02399|6-276: L7-R131 


Protein kinases ATP-bindirig region signature: II0-K33 


Amino Acid 
Residues 




<s 






Incyte 
Polypeptide 




7526442CD1 






CO H 2; 




9 







163 



wo 2004/098539 



PCTAJS2004/009215 




164 



wo 2004/098539 



PCT/US2004/009215 



1— I 



I 



i 



ON 



OS 
I 

ON 



O 

CO 
CO 



CO 

*o 
o" 

CO 
CO 



i 

6 

to 
o\ 



CO 
CO 

oo 
o\ 

CO 
I 

CM 

in 



oo 
m 
oo 



to 
to 



oo 



oo 

NO 



oo 
I 

CO 



»o 
o\ 



ON 
CO 



oo 

I 

i-H 

ON 



22 ^ 



CO Ti* 



to VO 



VO 



VO 



VO 



^ ^ oo 

2 2 i 

. Tt- ^ 

OO c>J in VO 

S: o\ VO 

1 CO to CSJ 

'-H <si CO to 



VO 



o 



o 

CO 
CO 

CM 
VO 



ON 
CO 



oo 

d> 

VO 
CO 



oo 

VO 



to 
to 

VO 

to 
to 
cs 



CO 

oo 



to 

ON 



VO 



VO 

On 



to 

s 



CS 



oo 



VO 
oo 
VO 

oo 
VO 



^ T-4 m 

S S 2 

^ ^ CO 

o 

CO g Tt 

S 

Tj- 1-H CO 

oo 

oo ON On 

i-H ^ c>) 



5o 



^ to OO VO 

^ CO CO « 

t*^ oo O 

^ ^ CO 

2 CO fsT VO 

3 2 S S 

^ - ?} s 

oo " I - 

S ^ to 

;i3 (N CO VO 



vb s ^ 



rj go ^ 



ON ^ to 

S c^i 



?3 S 



6 ^ 



i 



-o 

8 



I 

CO 



•oqQ 



<2 



8 

OO 
CO 



^ VO 
CO 



8 

to 
oo 
VO 
oo 



8 

to 

ON VO 

to 



e 

oo 

I 

cs 
to 

VO to 



8 

VO 

oo 
to 

— to 

S c3 



8 



o 
to 

VO cs 



8 

CO 



ON 

CO 

VO — < 



8 

VO 

oo 
to 



8 



Ov 
CO 



<0 CO 



165 



wo 2004/098539 



PCT/US2004/009215 



i 

CO 



cn 

O 

o 

I— I 



On 
ON 

1 



O 

o 



o 
o 



vo 



o 

3 



vo 



00 



5 



ON 

oC 

ON 

o 

I 



VO 



vo 



vo I CO 

5§ ? S 

« VO »-i 

as ^ ^ 

vo oo 1-H 

o Tj- 

r-H i-« lO 

O I I 

»o 

^ ^ S;: 

vo OV 

oo „ ^ 

irj oo cn 

® ^ f2 

f-H o vo 



CN On m 

► to ON 

oo .. 
oo o 

oo lO ON 



oo I I 

m ^ w-> 

• Tj- vo 

vo lO ON 

ON in cn 

^1 s 

CO I I 

^ On »n 

On O vo 

»o ON 

On 

• ON ON 

5i ^2 
^ (N vo 
CN ^ i-H 

^ ^ 

ON vo 
Op ON 

cs t-> 

2 5 3 
^ ^ s 

§ ?f :s 



»r> in 

cn r-< 

VO d ON 

4 

oo ^ 

of <S vo 

On vq ^ 

ON <N On 



c-* OO 

2 3 
Z2 



*. 1 
oo m 

oo 1— « 



4 cb 

ON 



OO 

§ 

OO 

ON ^2 



o r;- 

On O ^ 

m <s o 

^ ^ cs 

°^ 

S:: m 

^ ^ 

irT ^ ON 

CN O r- 
o 

1-H 1-4 

r-" rj; 

^ in ^ 

— S2 

1-H ^ 1-H 

ON VO cn 

OO ON r-- 

vo ^ ON 

1-H 1-H 1-H 

2 

^ r-. CO 

1-H ^ 1-H 

oo c^l vrT 
»o vo 

vo i-H oo 



^ ; o 



vo 
m oo 

O CO 



^ o> ^• 
^ 3t 

^ oo <^ 



U-) ^ ^ 

oo vo vo 

lO ^ ON 




5! 

ON 

VO 



CM 
I 

ON 
CO 

vo 



CO 

vo 



§ 

ON 
CO 
NO 



o 

s 



s 



vo 
oo 

CO 

cs 

i 

CO 

oo 



oo 

1-H 

CO 
CO 



CO 
ON 
CO 
CM 
I 

vo 
vo 
vo 



CO vi 

ON «0 
CO 

CN CSI 

s s[ 

ON r*^ 

"1 ^ 

CO oo 

ON 

CO ^ 

s ON 

CO vo 

ON oo 
CO O 
CM 

i25 

OO ^ 

"1 ^ 

CO cf\ 

o\ <s 

CO Q 



CO 
CO 



f2 

CO 



CM 

CO CO 

I 

oo 



o 

CO 
CM 



ON 

o 

I 

oo 

CO 

vo 

CO 



vo 

f 



oo 
oo 



O m 1-H 

vo On CM 

CO o 

I— I CM CM 



CO 



CO 



O CM 

ir> CO 

o vo 

CO CO 

CO 

CM CM 

VO 

CO CO 

t ^ 

CO ^ 

CO — 
CM 



C^ CO 

ON oo 

CM CM 



^ oo 

CO 

m 



si 

^ CM 
CO i 



ON 
CO 
CM 



O 
CM 
CO 
CM 



CO 

CM 
t 

CO 

On 



oo 
CM 
I 

o^ 

CM 

vo 

oo 

<^ 

c<J 

CO 
CM 



CO 

i 

CO 
CO 

o 

CO 



CO Q 

i 



s 59 

CO vo 

On »— • 

CO ON 

CM *-H 

gN ON 

^ CM 



o S5 12 
r-- r3 ON 

CM 1— I 



O irj 

CM ^ 

CM ON 

CO CM 

2 oC 

CM CO 

». 1 

O CO 

SVO 
00 

CO CM 

<i 

s s 

CO CM 

O 1-H 

00 CO 

1-H 

CM CM 



1-H 



CM 



ON 
VO 
CM 

5' 



^ ""1 

CM VO 

CO 5o 

r- S ON 

^ S ^ 

ON vo 

ON ON 00 

CM 

CM 1-1 CM 



CO ON VO 

O ON 00 

CO 

CM ^ CM 



JS ^ s 

^ ^ 

^ CO ON 

cS in ^ 

ON 00 00 

— 

C^ 

CO a\ 1-H 

vo ON 
vr% *iJ iirt 



8 



»-H CM CM 



- VO », 

s s ^ 

- s ^ 

I vo » 

?5 2 S 

xn ^00 

„ 1-H 1-H 

VO CM 

1-H 1-H ON 

^ • «o 

I 1-H 

CO 

CO 

in - i> 

* 1-H O 

1— < »0 CM 

- ^ Ci 

n: <^ p; 

^ 10 ^ 

^ vo 
»o ^00 

J. 1-H vo 

CM m 00 

CO 1-H 1-H 

ON ^ A 

CO xj- r-. 

CO CO 

vo ^ 



CM 



ON CO 

ON r-- 



CO 



ON 

5B 

CM 

CM 
CO 



vo 

CM 

CO 
CO 

O 

CO 



00 

^ ON 

CO ^ 
_f> I 

VO 

rt CO 
3^ 



CO 

vo 



CM 
ON 

00 



CO 
CO 
I 



00 CO 
CM O 

f 5 

VO 
CM- o 

^ 

06 CM 
On IT) 

vo 1-H 
CO 

CO 

S « 

r- 

CO 

vo 
in 

Ov 
CM 



CO 



CO 



CM 
00 
CO 
CM 



^ " ^ 

Vd ON 1-H 

00 vo ON 

r- 

CM 1-H CM 



O CM 

r-. o 

ON 

CO CO 

52 ON 

On 
CM 



ON 
NO 
CO 
CM 

uo 
00 

?3 



CM 

VO 
00 

CJN 



00 



s 

o\ 

CO 

CM 

00 



CO 
CO I-H 

00 

CO CO 



ON 

00 

CN 



00 



5 _ 

CO 00 

CM CO CO 



g2 ?5 

CM 



CM 

s 

CM 

1-H 

CM 



CO CO 



I 

'S ---^ 

i!5 2 ^ 



8 

1-H 

CM 



8 

CO 
vo 

1-H 

VO 
CM 

m 

^ CO 

T— I 

99 ?3 
vo 



8 

00 
wo 



vo 

CM 

m 

1-H 

ON 

ON o\ 
vo to 



166 



wo 2004/098539 



PCT/US2004/009215 



I 



r2 

a> 
u 

1 



VO 



3 



:t 2 a 



en ^ 



v|» cn 3 
o\ * On 



CM 



^ J^ 

so *0 



? O CNf 2 

^ ^ ^ 



csf ON ^ jq 

• o od ^ 

. 2 o 2 

s ^ 



3.2 
f 



_ o 
f— I oo 
* oo . 

g <A 1 

S «n 
^- 5J 



OO 



in ~ 

O ON 

NO O 

so ^ 

CS ON 

^ n6 

OO S cs 



« \0 O O OO <s 

53 S S g S 
S 3 ^ i S 

m ^ 3. S 
^oooou^ONVOmt^ 
^ wri ^ m >8 3 



ON OO 

I OO 

OO » > 

m ON 

r-- o 



CM «^ 



C>J CO 

»n oo cj V] 

Ov <N ^ 

2 2 

S a 



ON _ . 

OO ON 



I I CN *^ 

!0 £r» ^ S2 



- § 

S S ^ 

VO ^ ej 

S oo 



m ^ 
1— • VO oo 

c<i m 



cn 

t-^ CN in 
<N '-^ ^ 



:i a 
^ a 



4 

>n »n , _ 

oo ON ^ 

. . OO ^ ^ .'^ 

t-- 3; ON 2j ^ 

V ^=H ^ 5-4 ^ 
o ^ ^ ' " 

I ^ T-H »-H 

On 

r*. oo 



ON 

ro 



s ?2 

On oo 
VO On 



cn 



cn 
<s in 

VO ^ 
1— t 

^ oo 
OQ 



CO 
o 

o 

On CO 
CO in 

NO 



CO 

2 o 

OO 1-H I 

CO ^ 2^ 

^ ^ p: 



ON 
CO 

- On ^ ^ w , 

^ * «n £j m 

O *-H V 

CS S £^ 3 

II ^ in 

CS ON ^ I »-i 

S S? oC ;A . 

r-- cn » ^ ? 

^ i-L -J> 

^ ^ 3 

?^ s ^ 

oo On r. . 

VO - 



Q^^fo-^voi-HirroN 
tCt^inmcom^^o 



. - m in 

O oo 1— i ^ 

rsj VO ' • 

?! ^ 

^ s: OO 

^ oo ... 

i-4 VO r oo T-H 



t O T-H 

Q CN r-* 

r<i ^ VO 

On 1-H 

^ oo NO 



- VO VO oo in S ^ . 
in^Tfoin£Jvo 

vo!iov^^ONr^oo 

•'S^:s:ss25r^oN 

OFtr^^OOONOi— ICO 
0\V0 • ^ ^i-HCSI,— I 



CM 

^ in in CO 

oo CO O 



oo « 

VO OV 

m a ^ 

»-H CO £M VO 

*o ± - » 

O NO ON oo 

ON CO On 

oo =^ ^ 




^ oo CO 

ON O oo 

m OO 
^ 

in OO »o 

i s 

C>1 ^ CM 

ON in o 

oo ON OO 

m o — 

^ CM 



1-H 



"4 

g a 

CM O oo 

^ OO O 
^ oo 

<S ^ CM 

4 CM d> 

ON OO 

r** o ON 



ON <N 

OO QQ 



2^ ^ 

ON NO CO 

»n OO 

^ CM ^ 

CO 
CM 



■ r> I 

1-H On 

po in CM 

in ^ oo 

^ CM ^-H 

^ g S 

£1: 

Cjl ^ CM 

CM On VO 

oo NO 1-H 



Q O 



_ oo 
oo o 

2 S 



S S 

? g & 

CM 

o 6 

oo oo CM 

m ON oo Tj- 

^ ^ CM 



^ oo 
O 

CM ^ CM 

OO oo" o 

^ 
m o 

CM 



CO On 
VO 



S VO 
CM 

OO oo 

r-* O 
m 

^ c>i 

55 S 

S i2 

CM ^ 

oo CO ON 

r** in NO 

m - 

r-H CM 



^ o in oo 

o ^ CM 1— « 

s ^ s g s 
;4 2 pj s 

CM pj CM 

^ S po ^ ^ 

^ Q m oo 

Tt CM 1-H CM ^ 



NO 



«o ^ . 
^ S ^ ^ 



On ^ • 
^ S 

1-H o VO 



ON r-< 

VO O 



NO 
CM 



I 

OO 

oo 

^ c^ 
^ 5Q 

1-H ON 

CM 



int*^»-<CM^'-Hi-Hi--C 

ON^^CSO 

irj^^i-Hi-Hir)^ON 

CO ^ 1— I 
VO CM vD 



in 

cm" cm 
S 

i-T od 

« 2 

in i-H 

* ' • 

r*^ ON 

i 5g {2 



On c^ 
CO On 



^ CO 
VO ON 
^ CO 



o 2 



1-H 1-H « I 



ON 

oo 
oo 



ov 

CM 



^ ^ O 

O CM »-H CO 1-H irj 

po CM f-H ^ ^ O 

in f-i ^ ^ ^ ^ 



VO CO 



^ m 

oo o 

" S 

^ ON 



in o 

VO m ^ 

oo oo 1-H 

^ 1-H CM 



VO CO 



0\ 

: VO 



CO 

in o 

1-H 

ro" ^ 
^ § 

1-H ^ 
I 

CO 1-H 

CM CO 

in ON 



I 

CM 

r:! 

oo 1-H 

^ CM 
vi 



oo 



r- S in VO 

»— ' ON 

ON S O ON 

1-H *— < 



^ m ^ OO ^ 

oo C;4 r-< CM ^ 

& a oo' oo" oo 

CO CM NO On 

^ CO g ^ VO 

*n cnJ ^ 

oo " • - 

^ r-- o m NO Tj" 

,-H ^ VO On oo r«* 

^ o m CO CM 

un CM CM - 

ON CO ^ 

J-j J2 ^5 a 



^ s 

O ON 

CM ^ 

VO CO 

ON in 




. , *o ^ in 
o ^ 

3 t-^ oj oo" 4 



r- cM 
VO »o 
^ oo 



CM 



cy ^ 



i 

oo 

ON 
»-H 



e 

o 
oo 

NO 
CM 

!0 

CO 

^ in 



8 

S3 



O 



167 



wo 2004/098539 



PCT/US2004/009215 




168 



wo 2004/098539 



PCT/US2004/009215 



I 

i 
I 

CO 



!2 3 

^ CO 0\ 

irT oo 
u< ON 
^ vo oo 



r*- ?o cr> 



oo 0\ 



oo On O 



1-H ^ ^ 

?3 S 

S8 S ^ 

•» On CO r** 

so r>i vo in 



oo ON 



CO 

oo 

g 

so 

ON 
CO 



On 



On 



oo 
oo 



vo ^ 
r-> CO 

vo 



as 
vo 

»rr 

oo 

CO 



ON 



-i. s 



2" S i 2 



CO 



CO O 

\n oo 

CO CM 

Jo 

t-( oo 

c^ <s 

3 



uo 
vo 
CO 



vo »n 

^ o 

CO 



o\ 



00 

co 



oo 
oo 

ON 

»n 

I 

OO 



oo ON 
OO OO 
* CO 
CO CO 



4 o ^ 

vo oo 00 



^ i: V s s 



CI 

oo 

oo cs 

CO 



S5 



2 S 



vo 
00 



00 so 
00 

o 
00 

CO 
CO 

4 



4 

t-4 ON 

, CO C^I 

R IS 
?5 



ON ^ 



00 CS CO cs 



^ nl ^ 

CO 



in 
m 
1—1 
I 

»n 
00 

CO 



>n 

2 S S 



ON 
VO 

r- 

so 
m 
so 



CO 
CO 
CO 

00 ^ 
so v£) 
00 vo 
* CM 

00 

00 vo 
vo in 

CO 

00 I 

VO CO 
00 00 
in 



so 

. CO CN 

? 3 



i>> m S ^ 
^ vo 00 00 ^ 

^ T-H t-4 so 



00 

I 

so 



00 00 



^ 00 

^ so 
so 

00 V 

^ in 

O so 

so ^-H 

r-( ^ 

CO f*^ 
00 
vo 

vo 



CS CO 
ON O^ 

in vo 
vo 

CO C>1 

00 00 

so 
CO cs 

4 00" 

<N Jo 

00 
VO 
VO 



5 _ 

CO cs 
^ <N 



- 2 



.2906-; 


•5355, 




0 






m 






so 






CO 

Si 






00 


CO 

f— 1 






m 






d> 






CM 




so 






CO 






si 


0 




00 

CM 






CO 






m 


vo 




vo 






ro 


? 




si 






4,287 


-455 








m 


ON 




so 






CO 


CO 




4 






in 
00 






CO 


*r 






vo 


m 






vo 






CO 

t 


CO 




00 




ON 




CO 
00 




CO 






1 


1 

0 


in 


00 


in 


vo 




1-H 


CO 


CO 




2849- 




in 


00 


in 


so 


m 


CO 




00' 


1 

00 






0 






0 




CO 


CO 


vo 


,2849- 




0 


m 


m 


vo 


CM 


CO 


r*- 




1 

r— r 










vo 
CO 


S; 


1 


,2848- 


654, 


772, 


CO 


VO 


CO 




1 

0 






1-H 

ON 


CO 


CM 


»n 


,2823- 






m 


CO 


vo 


ON 




V 














vo 






CO 




in 


<i 




0" 








ON 


00 


vo 










m 


S 




1-H 




vo 






CO 


CM 


5 



CO 
CM 

o 

CM 



SO 
CM 

CTV 
00 
00 



CO 

s 

in 



o 
00 

CO 

m 



m 

CM 



in 

T-H 

CM 

i 

vo 
ON 
ON 
I 

CO 

S ^ 

O vo 
*n 

CO 00 

2 ^ 

0\ ^ 

CO fM 

ON CM 
CM 

try oT 



CM 

ON <^ 

1-H ^ 

. CM 
CM C 
t-H CM 



2S 



2 !^ 

4 S 

ON * 
1-H 



o 

m , 

^ m 

CM ^ 

o si 

*n rt 

^ c^ 
in 

2 CO 

^ CM 

CM 

CO in 



00 

1-H 00 

CO vo 

CSJ ^ 

2 CM 

1 CM 

ON d\ 

00 r-* 

1-H VO 

CM 



r-. ^ 



O 
1-H 



<3i 

so 
00 

00 
in 



CO 
CM 

I 

CO 



CM 



o 

CM 
I 

CO 



s 

o 

CM 



^ CM 

in 

CO in 

» I 

"St CM 

1-H r^t 

CO OV 

1-H CO 

CO m 

CM 



in 

CO so 

S 

CM 1-H 

00 so 

CM VO 

CM ^ 



tr\ CM 
ON 

12 ^ 

s6 

»-H iTi 

CO vo 

CM 1-H 
I 

00 1-H 

ON vo 

CO CO 

o in 

CO so 

CM ^ 



: ^ S ^ 
r ^ ^ ^ 



— CO 

^ CM 

«^ I 

CO ON 

^ CM 

CO in 

CM ^ 



OV 

in 



^ 3 

CM ^ 

5S co" 

iCJ CO 



1-H ^ 1-H 



1-H CM ^ 

1-H 00 1-H 

CO 3 CO 

CM ^ CM 

in ci 

00 ^ CM 

CO CO »n 

^ CM T-H 

1-H ^ 

4* crv 

»-H 1-H 

CO m 

CM ^ 

_L 

CM 

CO m 

1-H 1—1 

CO m 

CM 



CO 

CO 
CO 



CO 

o 

CS 



i CO CO in 

f-H ^ CM «-H 



5: ^ 

3 CO 

^ CNl 

^ OS 

^ o 



00 

CO 

ro vo 

00 Q 

in ^ 

^ CM 

I 

CO 1-H 
r-l CO 
CO VO 

CM 1-H 
00 <^ 

VO 5^ 

1-H ^ 

r> I 

m 

00 CM 

1-H VO 

CM i-H 
!2 CO 
1-H 

CM VO 
CM 1-H 

^ VO 



vo 

1-H f-« 



00 

a :s a 



o 



So 

ON 

00 
CM 

in 



CM 



vo 



«n 
On 



i 

CO 



8 

CO 



Q § ^ 
<S S >s >3 



CM 

in 

^ T-H 

ON O 
00 




8 

00 

CM 
CM 
SO 
CM 

in 

00 CO 



;=5 
8 

CM 

m 

00 CO 



8 

00 

m 
r-^ in 
^ m 

CO ON 

00 



169 



wo 2004/098539 



PCT/US2004/009215 




170 



wo 2004/098539 



PCT/US2004/009215 



I- 



P4 

S 



a 



I 

CO 

o 

Is 



I 

i 



is 



o 



CO 

cm 
m 



i 



e 

oo 



i 



9 

VO 



NO 



f2 



o 
oo 



oo 



so 

oo 



171 



wo 2004/098539 



PCT/US2004/009215 




172 



wo 2004/098539 



PCT/US2004/009215 



^ Q 




-— . CO 



T3 



Is 
2 



73 
4> 



-c 1 S I 



o 

S c 
S *cu o 



S <t! T3 

o o o - 

U4 O 



i5 c*^ ^ C 




« S ^ E .s 

« i § ^ ■§ 



•P -a .S 
e § § I <S S 

SI I i 11'^ 

ell ^11 
§ « s S*^ 

" S S & o § o 
S a " =5 •« a 



Q r^;j3 <Sa3 .N^*0 



CO § s g ^ 

g g B . .2 & :2 S 




Jig 

^ 3 PQ .2 

1 1 s ^ 




i 



173 



wo 2004/098539 



PCT/US2004/009215 




174 



wo 2004/098539 



PCTAJS2004/009215 



I 

I 



I 



*-» ^ 

^ i 5 i 

« -s -g 



S -9 




I* « fi 




o 



CO 

•8 

1 

I 

u 

s 



1 



I. 

o 
ex. 



2 



•g -s 



3 S 
1 ' 



11 

11 



8 



175 



wo 2004/098539 



PCTAJS2004/009215 




176 



wo 2004/098539 



PCT/US2004/009215 




177 



wo 2004/098539 



PCT/US2004/009215 



Hispanic 
Allele 1 
frequency 


■i 


■1 


■1 


•1 


■1 










•1 


■1 






■1 










=1 


■1 


•1 


■1 


■1 




•0 




•1 




■1 


Asian 
Allele 1 
frequency 




•1 




■§ 














-§ 














Cd 

■B 




■1 






Cd 

"5 








"a 




-1 


African 
Allele 1 
frequency 








■§ 






cd 
'a 












ed 

"a 




Cd 










-1 








"a 




cd 

"a 








Caucasian 
Allele 1 
frequency 


cd 




■i 










■i 




•1 


-0 




cd 








■1 


? 


? 


=1 










ed 


ed 






cd 

"3 


Amino Acid 


noncoding 


noncoding 


noncoding 


noncoding 


noncoding 


noncoding 


D18 


noncoding 


noncoding 


noncoding 


noncoding 


noncoding 


G274 


noncoding 


S275 


oo 

S' 


noncoding 


noncoding 


M164 


noncoding 


noncoding 


noncoding 


noncoding 


noncoding 


noncoding 


noncoding 


noncoding 


noncoding 


noncoding 




O 




O 


O 


O 


O 


< 


O 


O 


O 


O 


<: 


O 


H 


H 


H 


< 


o 


O 


o 


o 


u 


H 


< 


O 


< 


a 


o 


H 


Allele 


< 


O 


< 


H 


< 


< 


O 


< 


< 


< 


< 


a 


'< 


O 


O 


U 


O 


O 


< 


o 


O 


O 


CJ 


O 


O 


O 


O 


O 


O 


EST 
AUele 


< 


a 


< 


H 


O 


< 


O 


< 


< 


< 


< 


u 


< 


CJ 


a 


u 


O 


O 


< 


o 


O 


O 


u 


O 


O 


O 


O 


O 


O 


SI 


r— ( 


5o 

00 


00 

cn 


P~ 






ON 

o 


o 
cn 






ZLLl 


oo 


oo 
cn 
oo 


NO 

o\ 


o 

s; 


so 
so 
oo 




1642 


1-H 


1605 


1543 


1542 


1672 


»o 
oo 


ON 

oo 


oo 

ON 


1553 


1534 


1664 


SI 


ON 


m 


cn 


OO 
t~- 




On 
O 


o 

1-H 


§ 


cn 


1-H 


00 

o> 


00 

cn 


o 


CN 
•n 


oo 




V-) 


cn 

1-H 
1-H 


o> 


in 


so 

oo 
1—1 


vo 

<o 


vo 

oo 

1-H 


cn 

T— 1 


1-H 


T— 1 


ON 

o\ 

1-H 


r»- 

<i 
CS 




SNP ID 


SNP00003755 


SNP00098537 


SNP00I49767 


SNP00023921 


SNP00003755 


SNP00149767 


SNP00027387 


SNP00003755 


o 

CO 


SNP00003755 


SNP00149767 


SNP00126822 


SNP00065601 


SNP00069832 j 


SNP00075533 


SNP00075533 


SNP00075756 


cn 

s 

o 


SNP00100133 


SNP00033242 


SNP00033242 


SNP00033242 


SNP00136906 


SNP00057801 


SNP00096467 


SNP00075756 


SNP00033242 


SNP00033242 


SNP00136906 


EST ID 


142314T6 


142314T6 


142314T6 


1531602H1 


2655558T6 


2655558T6 


2829606H1 


2836842T6 


2836842T6 


2876073T6 


2876073T6 


7758626H1 


1265056T6 


1501560T6 


1501560T6 1 


1968576T6 


1238421H1 


1324236T6 


1394758F6 


1394758T6 


1631511T6 


1964258H1 


1964258H1 


3149675H1 


3149675H1 


6595879J1 


759508T6 


7636827H1 


7636827H1 


g 


7517831 


7517831 


7517831 


7517831 


7517831 


7517831 


7517831 


7517831 


7517831 


7517831 


7517831 


7517831 


7520272 


7520272 


7520272 


7520272 


7523965 


7523965 


7523965 


7523965 


7523965 


7523965 


7523965 


7523965 


7523965 


7523965 


7523965 


7523965 


7523965 


leg 


5 


5 


5 


4 


5 




5 


5 


5 


5 


5 


5 








!9 




5J 


5J 




5? 




5? 








5* 







178 



wo 2004/098539 



PCT/US2004/009215 



I 



o 



9- 



3^ 



s 



5 - 



CO 



a 



gag 



i 

s 



CO 



CO 

cn 

ON 
ON 
SO 

o 
o 

I 

CO 



CO 



2i 



o 
o 

i 



o 

o 
o 
o 



CO 



1— f 

vo 
o 

ON 



CM 



C4 



ON 



CO 



o 
o 



O 



CO 



a 



CO 



CO 



2i 



CD 



1 



CD 



00 



1 



s 



bO 

a 

o 
o 

s 



a a 



VO 



1^ 

o 
o 



CO 



CO 



GO 



o 



o 
o 



CO 



CO 



CO 



i 

1—1 



VO 

CO 

s 



VO 



^2 



5? 



5: 



CO 
oo 



1 



00 
ON 



a 



0\ 
ON 



oo 

ON 



CM 
OO 

s 

o 

I 

CO 



VO 
VO 

CO 

oo 
»n 

s 

o 
o 



CO 



CM 

00 
wo 

o 
o 



CO 



00 
o 

CO 
CM 



00 
00 

s 
J5 



5; 



■5 



CM 

uo 
00 

CD 



CM 
CO 
CO 

i 



CO 



=1 



VO 

P5 



CM 
CM 



CM 



wo 

CO 



CO 



00 



wo 

CO 
CM 



s 

wo 

f— I 

o 
o 

CO 



00 

CM 



00 
wo 

NO 



5: 



wo 



J?: 



VO 

VO 



179 



wo 2004/098539 



PCT/US2004/009215 



^1 



I 



^1 



a ^ 



o 



s 

Si 



CO 



i 
g 



o 

g 



g g 



o 
u 

o 



o < 



o 



a 



a 



o 



o 



8 



CO 



o 
cn 

CO 



00 



CM 



CO 



CO 



CO 



CO 



CO 



CO 



CO 



CO 



CO 



CO 



CO 



CO 



vo 



CO 



CO 



CO 



CO 



CO 



CO 



s 

o 
o 



CO 



oo 



i8 



8 



S 



180 



wo 2004/098539 



PCTAJS2004/009215 



C7 



§5 
< :5 



5 



u 

g 



On 



ON 



CJ 

<1> s 



On 
On 



T3 

g 



C 

o 
o 



i 

o 



o 

§ 



bO 

.E 

1 

g 



to 
c 

o 

g 



NO 

:0 On 

a o 



50 

c 

i 
g 



•I 



■I 



■I 



o 

g 

o 



i 

C 



q 



o 



a 



CO 



5 



CO 

o 

CO 



o 
oo 



o 
o 



CO 



CO 



vo 
oo 

VD 

r-H 

o 
o 



CO 



CO 



CO 



CO 



00 



CO 



Q 



NO 



VO 



VO 
oo 



oo 

i 



o 

I 

00 CO 



CO 



CO 



CO 



CO 



CO 



CO 



CO 



CO 



o 
vo 
o 
o 
o 
o 



CO 



CO 



CO 



o 

NO 

»o 
oo 

co 



o 
00 



s 

CJN 
C7N 



oo 

i 

CM 

in 



I) 

CO 



o o 
S so 



o o 
vo vo 



8 



vo 



8 



8 



8 



8 



8 



8 



8 



8 



8 



8 



8 



8 



181 



wo 2004/098539 



PCT/US2004/009215 



I 



'I 



5^ 



i - 

43 



^1 



o 

s 



I 

i 



o 

1 



bO 

i 

8 
S 



i 



i 

o 



o 



a 



a 



(J 



a 



o 



U CO 



p3 w 



5 



ON 



5 



Q 



CO 



00 



CO 



CO 



CO 



CO 



CO 



CO 



CO 



CO 



CO 



CO 



CO 



CO 



CO 



CO 



CO 



CO 



CO 



CO 



is 



so 



VO 



VO 



NO 



\0 VO NO NO VO 



VO 



NO 



VO 



182 



wo 2004/098539 



PCT/US2004/009215 



Hispanic 
AUeiel 
freauencv 


1 


1 


■1 




"a 


cd 










•1 




■1 


10.46 


■g 


0.76 


0.61 


•1 


I 


1 




1 




1 






1 


1 




Asian 
Allele 1 
freauencv 






■1 






cd 




■i 






■1 




-i 


0.45 




0.31 


0.61 


■i 


■1 










■§ 




Cd 

■a 








African 
AUeiel 
frequency 




■1 


•a 


■i 


■i 






■1 






Cd 

"S 


Cd 

"5 


-a 


0.44 




0.91 


0.47 


•1 


■1 






■1 








-§ 






■g 


Caucasian 
AUeiel 
frequency 






0.12 




0.67 






0.88 




0.15 






-1 


0.42 




0.74 


0.61 


■§ 


-1 


•1 




•1 








■1 




■i 


■§ 


Amino Acid 


noncoding 


1 noncoding 


noncoding 


[noncoding 




noncoding 


L182 


V207 


M135 


noncoding 


noncoding 


S291 


E486 


noncoding 


noncoding 


K395 


noncoding 


noncoding 


noncoding 


noncoding 


noncoding 


noncoding 


noncoding 


S113 


noncoding 


noncoding 


noncoding 


noncoding 


noncoding |] 


Allele 


O 


< 


O 


u 




a 


H 


H 


O 


< 


O 


O 


O 


H 




<C 


H 


O 


a 


< 


u 


< 


H 


E-i 


O 


< 


H 


<: 


< 


Allele 




O 


< 


< 


a 


O 


O 


O 


< 




< 


H 


< 


O 




o 


U 


H 


H 




H 


O 


O 


CJ 


< 


O 


U 


o 


O 


EST 
Allele 




O 


< 




a 


a 


u 


o 


O 


< 


< 


H 


< 


a 


H 


< 


H 


H 


H 


< 


U 


O 


O 


(J 


< 


o 


u 


o 


o 


el 


2374 


2390 


1961 


1981 


ON 
1-H 
TJ- 


oo 
oo 

1-H 


ON 


1036 


oo 


2206 


2239 


1290 


1875 


cn 


§^ 


1600 


3028 


1488 


1533 


2713 


1582 


2429 


2262 


1018 


1909 


2446 


2279. 


2482 


cn 
vn 

CM 




m 
cn 
«— ( 


On 

1-H 


o\ 
CO 


ON 
1-H 


vo 
cn 
cn 


T-H 


in 
cn 

1-H 


o 

vo 


? 


» 


oo 
«n 








vo 
cn 




»n 

» 




vo 

c5 


IN 


1-H 

oo 


1— t 
m 

T— 1 


o\ 


OO 


cn 
cn 


m 




m 


ITi 


SNP ID 


SNP00146630 


ISNP00146631 


SNP00040633 


SNP00040634 


SNP0Q00712O 


SNPC0049608 


SNP00048399 


SNP00096777 


SNP00127250 


SNPO00a7121 


SNP00096771 


SNP00096075 


SNP00146629 


SNP00102652 


SNP00148688 


SNP00096773 


SNP00047602 


SNP00155225 


SNP00155225 


SNP00051188 


SNP00155225 


SNP00062572 


SNP00098139 


SNPOOl 15694 


SNP00068492 


SNP00062572 


SNP0C098139 : 


SNP00062572 : 


SNP00062572 |; 


EST ID 


1458121H1 


1458121H1 


1663635F6 


1663635F6 


2013516T6 


2013516T6 


225489 IHl 


225489 IHl 


2254891R6 


257026H1 


257026H1 


4712047F6 


6094776H1 


6355618F6 


6355618F6 


6731321H1 


125901F1 


1553407H1 


2197671T6 


6723530H1 


829638T6 


1208904H1 


1223444H1 


1231274R6 


1341206H1 


1405367T6 


1405367T6 


1417137T6 


1553058T6 |, 


PID 


7526180 


7526180 


7526180 


7526180 


7526180 


7526180 


7526180 


7526180 


7526180 


7526180 


7526180 


7526180 


7526180 


7526180 


7526180 


7526180 


7526185 


7526185 


7526185 


7526185 


7526185 


7526192 


7526192 


7526192 


7526192 


7526192 


7526192 


7526192 


7526192 1 








«-H 






T-H 


T-H 






T-H 




I-H 


T-H 






T-H 


P5 


CM 


CM 






CO 








m 


m 




CO 



183 



wo 2004/098539 



PCT/US2004/009215 



Hispanic 
AUele 1 
frequency 


i 


•1 


■1 


■1 


•1 


1 


■1 


•1 






•1 


•1 


•1 


•1 


•1 


■1 


■1 


1 


■1 








•1 


■1 




1 


-1 


•i 




Asian 
Allele 1 
frequency 


■1 




■1 




-1 




cd 

"a 








■1 








■1 


























■1 




African 
Allele 1 
frequency 


cd 




"a 
























■1 












? 




■i 


■i 




i 








Caucasian 
Allele 1 
frequency 


=i 
























0.98 




0.97 




0.34 








1 








=1 






0.34 


0.97 1 


Amino Acid 


noncoding 


noncoding 


noncoding 


noncoding 


noncoding 


noncoding 


noncoding 


noncoding 


N232 


T241 


noncoding , 


noncoding 


noncoding 


noncoding 


noncoding 


noncoding 


noncoding 


noncoding 


noncoding 


noncoding 


noncoding 


noncoding 


noncoding 


noncoding 


noncoding 


noncoding 


noncoding 


noncoding 


noncoding 


Allele 
2 


H 


H 


O 


H 


< 


H 


O 


H 


O 


O 


O 


H 


O 


O 


< 


O 


o 


O 


U 


O 


O 


< 


H 


O 


O 


O 


.O 


O 


< 


< 


O 


U 


< 


U 


O 


u 


< 


O 


< 


< 


< 


O 


< 


< 


o 


< 




< 


H 


< 


< 




a 


< 


<: 


< 


< 




O 


EST 
Allele 


O 


O 


< 


O 


O 


u 


< 


O 


< 


< 


O 


O 


< 


< 


o 


< 


O 


< 


U 


< 


■< 


o 


o 


< 


< 


< 


< 


H 


O 


SI 


2271 


1686 


1911 


1683 


2342 


2177 


1908 


vo 


1539 


1566 




»o 

»— 1 


4315 


2487 


3533 


3121 


3401 


2824 




1856 


wn 
oo 


2187 


1594 


1195 


3128 


2852 


3149 


3428 


3560 1 




m 
<N 


Si 


On 




o 

oo 




cn 


CO 


vo 
1—1 


i—( 

as 
1—1 


o 


oo 
i—i 
(—1 


o 

vo 


vo 


S 

y—i 




cn 
cn 


cn 


^ 


o 

ON 

1-H 


o 

ON 
*— 1 


cn 


m 
1—1 


o 


1— r 


o 


CO 


oo 
cn 

CO 


o 
cn 


SNP ID 


SNP00098139 


SNP00068491 


SNP00068492 


SNP00068491 


SNP00062572 


SNP00098139 


SNP00068492 


SNP00057788 


SNP00142508 


SNP00142509 


SNPOOl 18120 


SNP00057788 


SNP00006796 


SNP00124328 


SNP00006288 


SNP00124330 


SNP00006287 


SNP00124329 


SNP00124327 


SNP00068980 


SNP00068980 


SNP00153438 


SNP00068979 


SNP00068978 


SNP00124330 


SNP00124329 


SNP00124330 


SNP00006287 


SNP00006288 | 


ESTE) 


1678219T6 


1722718F6 


1722718F6 


1722718H1 


2997552T6 


2997552T6 


7674218H2 


1328791H1 


4291033F6 


4291033F6 


7217965H1 


7760201H1 


1216956H1 


1224406H1 


1436210H1 


1438205H1 


1555235H1 


1597263F6 


1669032H1 


2555446F6 


2555446H1 


3337906H1 


3643184H1 


6754284H1 


7703764J1 


7753868H1 


7753868H1 


8598525H1 


8598525H1 | 


i 


7526192 


7526192 


7526192 


7526192 


7526192 


7526192 


7526192 


7526193 


7526193 


7526193 


7526193 


7526193 


7526196 


7526196 


7526196 


7526196 


7526196 


7526196 


7526196 


7526196 


7526196 


7526196 


7526196 


7526196 


7526196 


7526196 


7526196 


7526196 


7526196 




f2 




cn 


cn 


CO 


CO 


cn 






? 












{2 


{2 




12 




52 




{2 


J2 


12 


12 


{2 


{5 


{2 



184 



wo 2004/098539 



PCT/US2004/009215 



Hispanic 
Allele 1 
frequency 








-1 




1 




■1 




■1 




1 




-1 


•1 


•1 




■1 








■1 


•1 


1 






•1 


■1 


•1 


Asian 
Allele 1 
frequency 














■i 




■1 












•1 








? 




















-§ 


African 
Allele 1 
frequency 














•1 






•i 


















? 






■1 




•1 












Caucasian 
Allele 1 
frequency 


0.98 


? 


0.97 




0.34 




es 














0.91 








? 


? 


"a 


0.34 


0.97 




0.91 










"a 


Amino Acid 


K1309 


K696 


A1045 


8061 


SlOOl 


N809 


noncoding 


noncoding 


D1341 


noncoding 


A1317 


N487 


Y486 


T1244 


R596 


S399 


G266 


Q910 


D818 


H917 


DlOlO 


T1054 


noncoding 


Q454 


noncoding 


noncoding 


L348 


W218 


noncoding 


Allele 


O 


O 


< 


O 


u 


O 


o 


H 


< 


H 


H 


O 


O 


H 


< 


H 


O 


O 


O 


O 


o 


< 


H 


O 


O 




U 


U 




Allele 


< 


< 


o 


<! 


E- 


< 


H 


a 


O 


U 


U 


< 


< 


O 


O 


O 


< 


< 


< 


< 


H 


O 


a 


U 


< 


O 


H 


E- 


H 


EST 
Allele 


< 


< 


o 


< 


U 


< 


a 


a 


O 


a 


U 


< 


< 


O 


O 


U 


< 


< 


< 


< 


H 


o 


a 


U 


< 


U 


H 


E-i 


H 


si 


4122 


2284 


3331 


2918 


3199 


2621 


!S 


4272 


4217 


4270 


4147 


1655 


1653 


3928 


1984 


1393 




2925 


2649 


2946 


3226 


3358 


3805 


2502 


5753 


2683 


2185 


1794 


2866 


si 


o 


fa 


O 


S 


cn 
cn 


cn 




ON 

cn 


oo 


\o 


« 


o 

ON 


o 


o 
. — < 




>o 


O 


<N 
VO 


o 


cn 


OO 

cn 

<s 


o 

cn 


o 

cn 
cn 


m 
cn 


ON 
1— ( 


oo 


vo 
.— t 


OO 

1— ( 


OO 
j—t 


SNP ID 


SNP00006796 


SNP00124328 


SNP00006288 


SNP00124330 


SNP00006287 


SNP00124329 


SNP00124327 


SNP00029581 


SNP00092542 


SNP00029581 


SNP00136926 


SNP00068980 


SNP00068980 


SNP00006289 


SNP00153438 


SNP00068979 


SNP00068978 


SNP00124330 


SNP00124329 


SNP00124330 


SNP00006287 


SNP00006288 


SNP00033469 


SNP00013862 


SNP00003491 


SNP00053975 


SNP00053974 


SNP00153340 


SNP00132658 


EST ID 


1216956H1 


1224406H1 


1436210H1 


1438205H1 


1555235H1 


1597263F6 


1669032H1 


1806969T6 


a> 


2005750H1 


2189973H1 


2555446F6 


2555446H1 


2936740Hi 


3337906H1 


3643184H1 


6754284H1 


7703764J1 


7753868H1 


7753868H1 


8598525H1 


8598525H1 


1236920F1 


1284901H1 


1915448H1 


2528372H1 


268141 8H1 


2749684F6 


3331418H1 


PID 


7526198 


7526198 


7526198 


7526198 


7526198 


7526198 


7526198 


7526198 


7526198 


7526198 


7526198 


7526198 


7526198 


7526198 


7526198 


7526198 


7526198 


7526198 


7526198 


7526198 


7526198 


7526198 


7526208 


7526208 


7526208 


7526208 


7526208 


7526208 


7526208 


leg 




}S 






vo 


vo 














vo 




vo 
P>- 


vo 
r- 


{2 






so 





















185 



V/O 2004/098539 



PCT/US2004/009215 



.a ^ o 



■I 



•I 



•I 



■I 



VO 




•a 



"5 



■I 



"a 



=1 



o 



s 

o 



s 

c> 



s 

c5 



=1 



GO 

o 
o 

1 



o 
o 

i 



o 
o 

i 



o 

I 



o 

O 

c 

§ 



o 

i 



I 

O 

i 



c 



c 

i 



s 



o 
u 

§ 



I 

o 

i 



i 



C 

o 

g 



■■B 
o 

1 



o 
o 

g 



bO 
C 

o 

i 



bO 
C 

o 

i 



bJQ 
C 

o 

i 



J9 
*<3 



bll 
s 

o 

1 



J2 ^ 



ii 
^1 



§1 



CO 



On 
VO 
OS 



a 



<5i 
VO 

ON 



ON 



OO 



oo 

VO 



oo 

ON 
CO 
VO 



a 



oo 

ON 



VO 



VO 



cn 
o 

I 

CO 



CO 



ON 



o 
oo 

cn 

S; 

CO 



CO 
CO 

o 

Q 



CM 
CO 



CO 



CO 



o 
oo 

CO 
CO 

s 

S 

CO 



CO 



o 
o 

o 

8 



00 



CO 



CO 

ov 

ON 
CO 



CO 



»r> 

CO 

o 

oo 
1— I 

CO 

»o 
o 

I 

00 



C«4 

o 

CO 

cjn 

ON 
CO 

s 



oo 



CO 



CO 
CM 



o 
o 
o 



CO 



CO 



CO 

o\ 

ON 
CO 
»— ( 

o 
o 



00 



CO 
CO 
CO 
VO 



CM 

<o 

CO 

o 
oo 

CO 

o 
o 



00 



VO 



cs 



CM 



E5 



VO 

o 
o 

CJN 

o 
o 
o 



en 



00 



00 



oo 

VO 

VO 

o 
o 

I 

00 



o 

I 

00 



oo 
VO 
VO 

o 
o 
o 



CO 



VO 
CO 
CM 



CM 
oo 



oo 

ON 
ON 

oo 
VO 



ON 
C7N 

oo 

VO 
O 

8 



CO 



CO 



oo 

JO 

oo 
VO 

8 

o 



CO 



CO 
CM 
OO 
CO 
CJN 
VO 
CO 



% 

VO 
On 
CO 



oo 

O 
in 



oo 

VO 

?5 



CO 

oo 

CO 
ON 
VO 
CO 



OO 

O 

CO 



VO 

s 

CO 
CM 
CO 



CO 
CO 

o 

?5 



i 

CM 



oo 
o 

CO 



VO 

CO 
<N 
CO 



5 

VO 



CO 
CO 

o 
oo 

CM 



VO 
VO 



CO 
VO 
VO 

oo 

CM 



WO 



5 

WO 
CM 



8 

OO 
CM 



OO 

VO 
CM 

wo 



oo 

VO 

wo 



sag 



VO 
CM 
WO 

r- 



CM 
*— c 

wo 



o 
oo 



oo 

CM 
CM 
VO 

CSJ 

wo 



oo 



oo 

CM 
CM 

15 



VO 

15 



VO 



CM 
OO 



186 



wo 2004/098539 



PCT/US2004/009215 



Hispanic 
Allele 1 
frequency 


CIS 

■5 






1 


















at 

"S 


0.92 1 


at 

"3 




"a 


■1 




0.64 


■1 


■1 








•1 








Asian 
Allele 1 
frequency 


■s 


■a 


■i 








CO 




? 


1 






at 

"S 


0.98 


■§ 






cd 

■a 




0.86 








"5 












African 
Allele 1 
frequency 


•1 




•i 














i 








Ov 
O 












0.63 






at 

"5 


cd 
■a 


ed 

"a 


"a 








Caucasian 
Allele 1 
frequency 






as 














1-H 






■1 


0.82 












0.71 






1 


cd 


•1 






cd 


■a 


Amino Acid 


noncoding 


i noncoding 


noncoding 


G74 


R14 


L98 


P362 


D335 


L348 


L334 


noncoding 


noncoding 


noncoding 


noncoding 


noncoding 


noncoding 


A129 


noncoding 


noncoding 


noncoding 


noncoding 


noncoding 


noncoding 


noncoding 


noncoding 


noncoding 


noncoding 


noncoding 


o 


Allele 


O 


O 


O 


O 


O 


U 


u 


O 


H 


H 


CJ 


a 


u 


U 


O 






O 


H 


O 


H 




o 


H 


a 


< 


H 


a 


o 




< 


O 


< 


O 


<: 


O 


H 


H 


O 




H 


H 


H 


H 


H 


H 


U 


< 


O 


< 


u 


O 


< 


U 


H 




U 


o 


H 


EST 
Mele 


< 


a 


< 


O 


< 


O 


H 


<J 


H 


u 


H 


H 


H 


H 


o 


H 


U 


< 


H 


o 


a 


o 


< 


O 




< 


O 


o 


h' 


SI 


2946 


2116 


2365 


oo 
m 


o 

1-H 


S 


1194 


1113 


1151 


1108 


2179 


2259 


2257 


1282 


1347 


2235 


o 
in 


1775 


5335 


5447 


5461 


5601 


2848 


4783 


3316 


3367 


4780 


5606 


1-H 

CN 








so 

00 

-"it 


r-H 

CN 


cn 
VO 


55 


»-H 
ON 
1-H 


1-H 
CS 


cn 


o 
cn 


S 

cn 


<N 
vn 
<S 




cn 


1^ 
o\ 


oo 
\o 


oo 


VO 
OO 
1-H 




r-H 
ON 




oo 


VO 


oo 


CN 

oo 


cn 
cn 

r-H 


VO 

1-H 


oo 

!Q 


T-H 




SNP00041996 


SNP00068997 


SNP00068998 


SNP00076027 


SNP00132757 


SNP00037439 


SNP00043983 


SNP00154171 


SNP00037440 


SNPOOl 11294 


SNP00019740 


SNP00019740 


SNP00019740 


SNP00058093 


SNPOOl 14001 


SNP00019740 


SNP00125603 


SNP00003740 


SNP00012539 


SNP00012540 


SNP00045700 


SNP00045701 


SNP00022215 


SNP00012538 


SNP00028237 


SNP00028238 


SNP00012538 


SNP00045701 


SNP00023889 


EST ID 


4407121H1 


7621966J1 


7751044H1 


1348638F6 


1348638F6 


1444773H1 


1897166HI 


2770947H1 


3143852H1 


3143852H1 


.1649261F6 


1649261T6 


268900T6 


2745158H1 


2745158H1 


2921293T6 


8011285H1 


058064H1 


1004004H1 


1004004H1 


1004004H1 


1330039H1 


1363254H1 


1377277F1 


1675313F6 


1675313F6 


1675313T6 


1682961T7 


3003741H1 


g 


7526246 


7526246 


7526246 


7526258 


7526258 • 


7526258 


7526258 


7526258 


7526258 


7526258 


7526311 


7526311 


7526311 


7526311 


7526311 


7526311 


7526311 


7526315 


7526315 


7526315 


7526315 


7526315 


7526315 


7526315 


7526315 


7526315 


7526315 


7526315 


7526315 


leg 




oo 




cn 

00 


cn 
oo 


oo 


cn 
oo 


cn 
oo 


cn 
oo 


CO 

oo 






oo 


oo 








«n 

00 




3 




S3 


oo 


3 


«o 
oo 


S3 


oo 


S3 


S3 



187 



wo 2004/098539 



PCT/US2004/009215 




I, 

CT 



^1 



r 



1^1 



c3 



O 



o 

I 



1 
g 



s 



o 
o 

i 



o 
2 



H O H 



el 



si 



ON 



a 



CO 



CO 



CO 



CO 



CO 



CO 



CO 



C/3 



i 



leg 



3 



da 



3 



188 



